#EXERCISE 1.1: Summarize in your own words the biological background as per described in Pristner M, et al.
Part 1 answers are based on scientific evidence cited from the relevent publication: “Neuroactive metabolites and bile acids are altered in extremely premature infants with brain injury” Manuel Pristner, Daniel Wasinger, David Seki, Katrin Klebermaß-Schrehof, Angelika Berger, David Berry, Lukas Wisgrill, Benedikt Warth medRxiv 2023.05.17.23290088; doi: https://doi.org/10.1101/2023.05.17.23290088 Now published in Cell Reports Medicine doi: 10.1016/j.xcrm.2024.101480
The experimental design and biological background is as follows: The gut microbiome is associated with pathological neurophysiological evolvement in extremely premature infants suffering from brain injury. To elucidate the early-life biological phenomeno involved,the plasma and fecal metabolome of a cohort of 51 extremely premature infants with brain injury and the correlation with gut microbiome and immunological data was studied via liquid chromatography (LC)-high-resolution mass spectrometry (MS)-based untargeted metabolomics and LC-MS/MS-based targeted analysis. For our specific analysis, 75 fecal samples that were collected at 3 different time points (i.e. 7 days, 28 days, and term_age) were analyzed, along with 34 QC samples, using hydrophillic chromatography-negative ion mode ESI MS and MS/MS.An early onset of differentiation in neuroactive metabolites between infants with and without brain injury and several bacterially derived bile acid amino acid conjugates in plasma and feces were detected.
#EXERCISE 1.2: What type of metabolomic approaches were used to analyse the fecal samples? Were they untargeted or a targeted?
Both targeted and untargeted approaches were used to analyze fecal samples.Targeted metabolomics applies methods that have been optimized to quantify a defined set of molecules, and such approach is well suited for research questions that require measuring all of a small number of analytes in a sample (e.g., does this drug affect a specific pathway of interest? do these samples contain a given pesticide? is this compound a marker for a particular phenotype?). Targeted metabolomics apply tandem mass spectrometers like triple quadrupoles (QqQ)to quantitate less than a few dozen compounds, but some workflows have been developed to monitor several hundred metabolites.
#EXERCISE 1.3: For the untargeted analysis, what type of chromatography was used? Was it normal and/or reverse phase?
For untargeted analysis, both (normal) Hydrophillic and (relatively more hydrophobic) reverse-phase chromatography were used Two columns, a HILIC column(SeQuant ZIC-pHILIC; 5um, 150x2.1mm) and a RP column (Waters Acquity HSST3; 1.8um,100x2.1mm) were used for complementary and broad coverage of analytes.
#EXERCISE 1.4: What type of MS-instrument was used in either case?
For untargeted metabolomics:VAnquish UHPLC system coupled to a QExactive HF quadrupole-Orbitrap(Thermo) mass spectrometer via an ESI interface was used. For targeted metabolomic, TSQ Vantage Triple Quadrupole mass spectrometer (Thermo) via an ESI interface was used This is later confirmed below via the following code: mbank_sub[2]$instrument [1] “LTQ Orbitrap XL, Thermo Scientfic; HP-1100 HPLC, Agilent”
#EXERCISE 1.5 What type of ionization and ionization modes were used?
Electro-spray ionionization (ESI) was used and negative ionization mode (polarity=0) were used. Later Confirm polarity used for acquisition: polarities <- table(mz.conditions$polarity) head(polarities) 0-negative, 1-positive, -1=unkown 0L for negative polarity, 1L for positive polarity
#EXERCISE 2.1: How many replicates were measured for each time point? (Refer to s_MTBLS7560.txt in Metabolight for the experimental desing) #How many experimental groups are there?. Determine the number of samples per group.
There are 7 experimental groups (“CTR_day28” “CTR_day7” “CTR_termage” “PAT_day28” “PAT_day7” “PAT_termage” “QC”), each have, respectively 31, 4, 13, 11, 8, 8, and 34 replicates measured for each time point or QC. This was determined and confirmed by by the following rationale and code: By referring to cited scientification publication and associated file s_MTBLS7560.txt in Metabolight, I organized the downloaded samples via the following directory:
#feces_ID_1 day_7 PAT #feces_ID_2 day_7 CTR #feces_ID_3 day_7 PAT
#feces_ID_4 day_7 PAT #feces_ID_5 day_7 PAT #feces_ID_6 day_7 PAT
#feces_ID_7 day_7 CTR #feces_ID_8 day_7 PAT #feces_ID_9 day_7 CTR
#feces_ID_10 day_7 CTR #feces_ID_11 day_7 PAT #feces_ID_12 day_7 PAT
#feces_ID_13 day_28 CTR #feces_ID_14 day_28 PAT #feces_ID_15 day_28 CTR
#feces_ID_16 day_28 PAT #feces_ID_17 day_28 PAT #feces_ID_18 day_28 CTR
#feces_ID_19 day_28 CTR #feces_ID_20 day_28 CTR #feces_ID_21 day_28 CTR
#feces_ID_22 day_28 CTR #feces_ID_23 day_28 CTR #feces_ID_24 day_28 CTR
#feces_ID_25 day_28 CTR #feces_ID_26 day_28 CTR #feces_ID_27 day_28 CTR
#feces_ID_28 day_28 CTR #feces_ID_29 day_28 CTR #feces_ID_30 day_28 CTR
#feces_ID_31 day_28 CTR #feces_ID_32 day_28 CTR #feces_ID_33 day_28 PAT
#feces_ID_34 day_28 CTR #feces_ID_35 day_28 CTR #feces_ID_36 day_28 CTR
#feces_ID_37 day_28 CTR #feces_ID_38 day_28 PAT #feces_ID_39 day_28 CTR
#feces_ID_40 day_28 CTR #feces_ID_41 day_28 PAT #feces_ID_42 day_28 PAT
#feces_ID_43 day_28 CTR #feces_ID_44 day_28 CTR #feces_ID_45 day_28 PAT
#feces_ID_46 day_28 CTR #feces_ID_47 day_28 PAT #feces_ID_48 day_28 CTR
#feces_ID_49 day_28 CTR #feces_ID_50 day_28 CTR #feces_ID_51 day_28 CTR
#feces_ID_52 day_28 PAT #feces_ID_53 day_28 CTR #feces_ID_54 day_28 PAT
#feces_ID_55 term_age CTR #feces_ID_56 term_age PAT #feces_ID_57
term_age CTR #feces_ID_58 term_age PAT #feces_ID_59 term_age CTR
#feces_ID_60 term_age CTR #feces_ID_61 term_age CTR #feces_ID_62
term_age PAT #feces_ID_63 term_age PAT #feces_ID_64 term_age CTR
#feces_ID_65 term_age PAT #feces_ID_66 term_age PAT #feces_ID_67
term_age CTR #feces_ID_68 term_age CTR #feces_ID_69 term_age CTR
#feces_ID_70 term_age PAT #feces_ID_71 term_age PAT #feces_ID_72
term_age CTR #feces_ID_73 term_age CTR #feces_ID_74 term_age CTR
#feces_ID_75 term_age CTR #feces_targeted_reverse_Matrix
#feces_targeted_reverse_QC #feces_targeted_reverse_Blank
#feces_untargeted_hilic_Process_blank_1
#feces_untargeted_hilic_Process_blank_2
#feces_untargeted_hilic_Process_blank_3 #feces_untargeted_hilic_QC
#feces_untargeted_hilic_Solvent_blank
#feces_untargeted_hilic_Solvent_blank_carry_over
#feces_untargeted_hilic_SRM #feces_untargeted_hilic_STD_mix
#feces_untargeted_reverse_Process_blank_1
#feces_untargeted_reverse_Process_blank_2
#feces_untargeted_reverse_Process_blank_3 #feces_untargeted_reverse_QC
#feces_untargeted_reverse_QC_cond
#feces_untargeted_reverse_Solvent_blank
#feces_untargeted_reverse_Solvent_blank_carry_over
#feces_untargeted_reverse_SRM
#feces_untargeted_reverse_STD_mix
#The number of groups and replicates are confirmed as follows:
pData(raw_data)$class
## [1] CTR_day28 CTR_day28 CTR_day28 CTR_day28 CTR_day28 CTR_day28
## [7] CTR_day28 CTR_day28 CTR_day28 CTR_day28 CTR_day28 CTR_day28
## [13] CTR_day28 CTR_day28 CTR_day28 CTR_day28 CTR_day28 CTR_day28
## [19] CTR_day28 CTR_day28 CTR_day28 CTR_day28 CTR_day28 CTR_day28
## [25] CTR_day28 CTR_day28 CTR_day28 CTR_day28 CTR_day28 CTR_day28
## [31] CTR_day28 CTR_day7 CTR_day7 CTR_day7 CTR_day7 CTR_termage
## [37] CTR_termage CTR_termage CTR_termage CTR_termage CTR_termage CTR_termage
## [43] CTR_termage CTR_termage CTR_termage CTR_termage CTR_termage CTR_termage
## [49] PAT_day28 PAT_day28 PAT_day28 PAT_day28 PAT_day28 PAT_day28
## [55] PAT_day28 PAT_day28 PAT_day28 PAT_day28 PAT_day28 PAT_day7
## [61] PAT_day7 PAT_day7 PAT_day7 PAT_day7 PAT_day7 PAT_day7
## [67] PAT_day7 PAT_termage PAT_termage PAT_termage PAT_termage PAT_termage
## [73] PAT_termage PAT_termage PAT_termage QC QC QC
## [79] QC QC QC QC QC QC
## [85] QC QC QC QC QC QC
## [91] QC QC QC QC QC QC
## [97] QC QC QC QC QC QC
## [103] QC QC QC QC QC QC
## [109] QC
## Levels: CTR_day28 CTR_day7 CTR_termage PAT_day28 PAT_day7 PAT_termage QC
table(pData(raw_data)$class)
##
## CTR_day28 CTR_day7 CTR_termage PAT_day28 PAT_day7 PAT_termage
## 31 4 13 11 8 8
## QC
## 34
class(raw_data$class)
## [1] "factor"
nlevels(raw_data$class)
## [1] 7
length(unique(raw_data$class))
## [1] 7
levels(pData(raw_data)$class)
## [1] "CTR_day28" "CTR_day7" "CTR_termage" "PAT_day28" "PAT_day7"
## [6] "PAT_termage" "QC"
#Convert Single Factor Vector to Character Vector
character_vector <- as.character(raw_data$class)
character_vector
## [1] "CTR_day28" "CTR_day28" "CTR_day28" "CTR_day28" "CTR_day28"
## [6] "CTR_day28" "CTR_day28" "CTR_day28" "CTR_day28" "CTR_day28"
## [11] "CTR_day28" "CTR_day28" "CTR_day28" "CTR_day28" "CTR_day28"
## [16] "CTR_day28" "CTR_day28" "CTR_day28" "CTR_day28" "CTR_day28"
## [21] "CTR_day28" "CTR_day28" "CTR_day28" "CTR_day28" "CTR_day28"
## [26] "CTR_day28" "CTR_day28" "CTR_day28" "CTR_day28" "CTR_day28"
## [31] "CTR_day28" "CTR_day7" "CTR_day7" "CTR_day7" "CTR_day7"
## [36] "CTR_termage" "CTR_termage" "CTR_termage" "CTR_termage" "CTR_termage"
## [41] "CTR_termage" "CTR_termage" "CTR_termage" "CTR_termage" "CTR_termage"
## [46] "CTR_termage" "CTR_termage" "CTR_termage" "PAT_day28" "PAT_day28"
## [51] "PAT_day28" "PAT_day28" "PAT_day28" "PAT_day28" "PAT_day28"
## [56] "PAT_day28" "PAT_day28" "PAT_day28" "PAT_day28" "PAT_day7"
## [61] "PAT_day7" "PAT_day7" "PAT_day7" "PAT_day7" "PAT_day7"
## [66] "PAT_day7" "PAT_day7" "PAT_termage" "PAT_termage" "PAT_termage"
## [71] "PAT_termage" "PAT_termage" "PAT_termage" "PAT_termage" "PAT_termage"
## [76] "QC" "QC" "QC" "QC" "QC"
## [81] "QC" "QC" "QC" "QC" "QC"
## [86] "QC" "QC" "QC" "QC" "QC"
## [91] "QC" "QC" "QC" "QC" "QC"
## [96] "QC" "QC" "QC" "QC" "QC"
## [101] "QC" "QC" "QC" "QC" "QC"
## [106] "QC" "QC" "QC" "QC"
#Run Length Encoding
rle(sort(character_vector))
## Run Length Encoding
## lengths: int [1:7] 31 4 13 11 8 8 34
## values : chr [1:7] "CTR_day28" "CTR_day7" "CTR_termage" "PAT_day28" "PAT_day7" ...
#EXERCISE 2.2 How much did the entire chromatographic run span for
each of them?
For ALL samples combined, chromatography was recorded from 0:00 to 15:01
minutes (We later calculate chromatographic run time for EACH sample)
Confirmed via the “XCMSnExp” S4 object raw_data (also xdata the xcms S4
object)
raw_data
## MSn experiment data ("OnDiskMSnExp")
## Object size in memory: 30.09 Mb
## - - - Spectra data - - -
## MS level(s): 1 2
## Number of spectra: 107267
## MSn retention times: 0:00 - 15:01 minutes
## - - - Processing information - - -
## Data loaded [Fri Jun 14 11:07:33 2024]
## MSnbase version: 2.30.1
## - - - Meta data - - -
## phenoData
## rowNames: 025_ID_13 029_ID_15 ... 119_QC (109 total)
## varLabels: class
## varMetadata: labelDescription
## Loaded from:
## [1] 025_ID_13.mzML... [109] 119_QC.mzML
## Use 'fileNames(.)' to see all files.
## protocolData: none
## featureData
## featureNames: F001.S001 F001.S002 ... F109.S896 (107267 total)
## fvarLabels: fileIdx spIdx ... spectrum (35 total)
## fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
#Confirmed as follows:
#Retention times of all spectra
head(rtime(raw_data))
## F001.S001 F001.S002 F001.S003 F001.S004 F001.S005 F001.S006
## 0.26276 1.41275 2.44838 3.44724 4.45024 5.46111
#ADDITIONAL PREMINARY DATA EXPLORATION TO PROVIDE RATIONALE TO ANSWERS FOR LATER EXERCISES
#Explore rawData "onDisk" object (S4 object serving as abstraction of MS data)
class(raw_data)
## [1] "OnDiskMSnExp"
## attr(,"package")
## [1] "MSnbase"
dim(raw_data)
## [1] 109 1
head(raw_data)
## MSn experiment data ("OnDiskMSnExp")
## Object size in memory: 0.03 Mb
## - - - Spectra data - - -
## MS level(s): 1
## Number of spectra: 6
## MSn retention times: 0:00 - 0:05 minutes
## - - - Processing information - - -
## Data loaded [Fri Jun 14 11:07:33 2024]
## MSnbase version: 2.30.1
## - - - Meta data - - -
## phenoData
## rowNames: 025_ID_13
## varLabels: class
## varMetadata: labelDescription
## Loaded from:
## 025_ID_13.mzML
## protocolData: none
## featureData
## featureNames: F001.S001 F001.S002 ... F001.S006 (6 total)
## fvarLabels: fileIdx spIdx ... spectrum (35 total)
## fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
#str(raw_data)
raw_data$class
## [1] CTR_day28 CTR_day28 CTR_day28 CTR_day28 CTR_day28 CTR_day28
## [7] CTR_day28 CTR_day28 CTR_day28 CTR_day28 CTR_day28 CTR_day28
## [13] CTR_day28 CTR_day28 CTR_day28 CTR_day28 CTR_day28 CTR_day28
## [19] CTR_day28 CTR_day28 CTR_day28 CTR_day28 CTR_day28 CTR_day28
## [25] CTR_day28 CTR_day28 CTR_day28 CTR_day28 CTR_day28 CTR_day28
## [31] CTR_day28 CTR_day7 CTR_day7 CTR_day7 CTR_day7 CTR_termage
## [37] CTR_termage CTR_termage CTR_termage CTR_termage CTR_termage CTR_termage
## [43] CTR_termage CTR_termage CTR_termage CTR_termage CTR_termage CTR_termage
## [49] PAT_day28 PAT_day28 PAT_day28 PAT_day28 PAT_day28 PAT_day28
## [55] PAT_day28 PAT_day28 PAT_day28 PAT_day28 PAT_day28 PAT_day7
## [61] PAT_day7 PAT_day7 PAT_day7 PAT_day7 PAT_day7 PAT_day7
## [67] PAT_day7 PAT_termage PAT_termage PAT_termage PAT_termage PAT_termage
## [73] PAT_termage PAT_termage PAT_termage QC QC QC
## [79] QC QC QC QC QC QC
## [85] QC QC QC QC QC QC
## [91] QC QC QC QC QC QC
## [97] QC QC QC QC QC QC
## [103] QC QC QC QC QC QC
## [109] QC
## Levels: CTR_day28 CTR_day7 CTR_termage PAT_day28 PAT_day7 PAT_termage QC
#spectra(raw_data)
## Extracting individual spectra or a subset is much faster.
#spectra(raw_data[1:50])
class(pheno)
## [1] "data.frame"
dim(pheno)
## [1] 109 1
head(pheno)
## class
## 025_ID_13 CTR_day28
## 029_ID_15 CTR_day28
## 034_ID_18 CTR_day28
## 037_ID_19 CTR_day28
## 040_ID_20 CTR_day28
## 042_ID_21 CTR_day28
# Get length of data (total number of spectra)
length(raw_data)
## [1] 107267
#Get the MS level
head(msLevel(raw_data))
## F001.S001 F001.S002 F001.S003 F001.S004 F001.S005 F001.S006
## 1 1 1 1 1 1
## Get featureData, use fData to return as a data.frame
head(fData(raw_data))
## fileIdx spIdx smoothed seqNum acquisitionNum msLevel polarity
## F001.S001 1 1 NA 1 1 1 0
## F001.S002 1 2 NA 2 3 1 0
## F001.S003 1 3 NA 3 5 1 0
## F001.S004 1 4 NA 4 7 1 0
## F001.S005 1 5 NA 5 9 1 0
## F001.S006 1 6 NA 6 11 1 0
## originalPeaksCount totIonCurrent retentionTime basePeakMZ
## F001.S001 1017 25075350 0.26276 116.9286
## F001.S002 965 37729140 1.41275 116.9286
## F001.S003 1008 24703222 2.44838 116.9286
## F001.S004 1013 36653096 3.44724 116.9286
## F001.S005 935 30202434 4.45024 116.9286
## F001.S006 1046 38604088 5.46111 116.9286
## basePeakIntensity collisionEnergy ionisationEnergy lowMZ highMZ
## F001.S001 2943961 NA 0 64.00056 652.7924
## F001.S002 4164285 NA 0 63.99261 831.1013
## F001.S003 3102831 NA 0 62.58697 641.7801
## F001.S004 4053742 NA 0 63.77844 873.0706
## F001.S005 3491318 NA 0 63.99261 842.8361
## F001.S006 4343667 NA 0 63.46862 799.4724
## precursorScanNum precursorMZ precursorCharge precursorIntensity
## F001.S001 NA NA NA NA
## F001.S002 NA NA NA NA
## F001.S003 NA NA NA NA
## F001.S004 NA NA NA NA
## F001.S005 NA NA NA NA
## F001.S006 NA NA NA NA
## mergedScan mergedResultScanNum mergedResultStartScanNum
## F001.S001 NA NA NA
## F001.S002 NA NA NA
## F001.S003 NA NA NA
## F001.S004 NA NA NA
## F001.S005 NA NA NA
## F001.S006 NA NA NA
## mergedResultEndScanNum injectionTime filterString
## F001.S001 NA 0 <NA>
## F001.S002 NA 0 <NA>
## F001.S003 NA 0 <NA>
## F001.S004 NA 0 <NA>
## F001.S005 NA 0 <NA>
## F001.S006 NA 0 <NA>
## spectrumId centroided
## F001.S001 controllerType=0 controllerNumber=1 scan=1 TRUE
## F001.S002 controllerType=0 controllerNumber=1 scan=3 TRUE
## F001.S003 controllerType=0 controllerNumber=1 scan=5 TRUE
## F001.S004 controllerType=0 controllerNumber=1 scan=7 TRUE
## F001.S005 controllerType=0 controllerNumber=1 scan=9 TRUE
## F001.S006 controllerType=0 controllerNumber=1 scan=11 TRUE
## ionMobilityDriftTime isolationWindowTargetMZ
## F001.S001 NA NA
## F001.S002 NA NA
## F001.S003 NA NA
## F001.S004 NA NA
## F001.S005 NA NA
## F001.S006 NA NA
## isolationWindowLowerOffset isolationWindowUpperOffset
## F001.S001 NA NA
## F001.S002 NA NA
## F001.S003 NA NA
## F001.S004 NA NA
## F001.S005 NA NA
## F001.S006 NA NA
## scanWindowLowerLimit scanWindowUpperLimit spectrum
## F001.S001 64.00056 652.7924 1
## F001.S002 63.99261 831.1013 2
## F001.S003 62.58697 641.7801 3
## F001.S004 63.77844 873.0706 4
## F001.S005 63.99261 842.8361 5
## F001.S006 63.46862 799.4724 6
#which file the spectra are
head(fromFile(raw_data))
## F001.S001 F001.S002 F001.S003 F001.S004 F001.S005 F001.S006
## 1 1 1 1 1 1
#file names:
head(fileNames(raw_data))
## [1] "C:\\Users\\User\\Desktop\\R\\Metabolomics\\Metabolomics\\MS1\\CTR_day28\\025_ID_13.mzML"
## [2] "C:\\Users\\User\\Desktop\\R\\Metabolomics\\Metabolomics\\MS1\\CTR_day28\\029_ID_15.mzML"
## [3] "C:\\Users\\User\\Desktop\\R\\Metabolomics\\Metabolomics\\MS1\\CTR_day28\\034_ID_18.mzML"
## [4] "C:\\Users\\User\\Desktop\\R\\Metabolomics\\Metabolomics\\MS1\\CTR_day28\\037_ID_19.mzML"
## [5] "C:\\Users\\User\\Desktop\\R\\Metabolomics\\Metabolomics\\MS1\\CTR_day28\\040_ID_20.mzML"
## [6] "C:\\Users\\User\\Desktop\\R\\Metabolomics\\Metabolomics\\MS1\\CTR_day28\\042_ID_21.mzML"
## Scan index and acquisitionNum
head(scanIndex(raw_data))
## F001.S001 F001.S002 F001.S003 F001.S004 F001.S005 F001.S006
## 1 2 3 4 5 6
head(acquisitionNum(raw_data))
## F001.S001 F001.S002 F001.S003 F001.S004 F001.S005 F001.S006
## 1 3 5 7 9 11
#Retention times of all spectra
#head(rtime(raw_data))
# Get the polarity of spectra.
head(polarity(raw_data))
## F001.S001 F001.S002 F001.S003 F001.S004 F001.S005 F001.S006
## 0 0 0 0 0 0
#Extract intensities and M/Z values per spectrum, returning a list, each element representing the values for one spectrum.
#head(intensity(raw_data))
#head(mz(raw_data))
#Spectra are organized sequentially (i.e., not by file) but the fromFile() function is used to get for each spectrum
#the information to which of the data files it belongs. Below we simply count the number of spectra per file.
table(fromFile(raw_data))
##
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
## 888 899 894 893 894 888 892 857 888 894 898 899 893 894 889 889
## 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
## 889 890 892 895 899 896 891 890 888 893 896 884 889 894 891 900
## 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
## 877 889 886 896 883 898 890 883 896 894 888 892 894 888 900 898
## 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
## 893 895 886 891 894 894 894 892 891 891 852 888 892 889 893 888
## 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80
## 896 892 903 895 888 900 898 895 897 891 893 6232 6072 825 822 824
## 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96
## 837 837 844 848 892 895 896 895 897 896 895 893 896 895 892 887
## 97 98 99 100 101 102 103 104 105 106 107 108 109
## 870 893 894 886 893 895 895 836 897 897 899 897 896
#Total number MS spectra these files contained and proportions of them as MS and MS/MS spectra:
mz.conditions <- MSnbase::fData(raw_data)
head(mz.conditions)
## fileIdx spIdx smoothed seqNum acquisitionNum msLevel polarity
## F001.S001 1 1 NA 1 1 1 0
## F001.S002 1 2 NA 2 3 1 0
## F001.S003 1 3 NA 3 5 1 0
## F001.S004 1 4 NA 4 7 1 0
## F001.S005 1 5 NA 5 9 1 0
## F001.S006 1 6 NA 6 11 1 0
## originalPeaksCount totIonCurrent retentionTime basePeakMZ
## F001.S001 1017 25075350 0.26276 116.9286
## F001.S002 965 37729140 1.41275 116.9286
## F001.S003 1008 24703222 2.44838 116.9286
## F001.S004 1013 36653096 3.44724 116.9286
## F001.S005 935 30202434 4.45024 116.9286
## F001.S006 1046 38604088 5.46111 116.9286
## basePeakIntensity collisionEnergy ionisationEnergy lowMZ highMZ
## F001.S001 2943961 NA 0 64.00056 652.7924
## F001.S002 4164285 NA 0 63.99261 831.1013
## F001.S003 3102831 NA 0 62.58697 641.7801
## F001.S004 4053742 NA 0 63.77844 873.0706
## F001.S005 3491318 NA 0 63.99261 842.8361
## F001.S006 4343667 NA 0 63.46862 799.4724
## precursorScanNum precursorMZ precursorCharge precursorIntensity
## F001.S001 NA NA NA NA
## F001.S002 NA NA NA NA
## F001.S003 NA NA NA NA
## F001.S004 NA NA NA NA
## F001.S005 NA NA NA NA
## F001.S006 NA NA NA NA
## mergedScan mergedResultScanNum mergedResultStartScanNum
## F001.S001 NA NA NA
## F001.S002 NA NA NA
## F001.S003 NA NA NA
## F001.S004 NA NA NA
## F001.S005 NA NA NA
## F001.S006 NA NA NA
## mergedResultEndScanNum injectionTime filterString
## F001.S001 NA 0 <NA>
## F001.S002 NA 0 <NA>
## F001.S003 NA 0 <NA>
## F001.S004 NA 0 <NA>
## F001.S005 NA 0 <NA>
## F001.S006 NA 0 <NA>
## spectrumId centroided
## F001.S001 controllerType=0 controllerNumber=1 scan=1 TRUE
## F001.S002 controllerType=0 controllerNumber=1 scan=3 TRUE
## F001.S003 controllerType=0 controllerNumber=1 scan=5 TRUE
## F001.S004 controllerType=0 controllerNumber=1 scan=7 TRUE
## F001.S005 controllerType=0 controllerNumber=1 scan=9 TRUE
## F001.S006 controllerType=0 controllerNumber=1 scan=11 TRUE
## ionMobilityDriftTime isolationWindowTargetMZ
## F001.S001 NA NA
## F001.S002 NA NA
## F001.S003 NA NA
## F001.S004 NA NA
## F001.S005 NA NA
## F001.S006 NA NA
## isolationWindowLowerOffset isolationWindowUpperOffset
## F001.S001 NA NA
## F001.S002 NA NA
## F001.S003 NA NA
## F001.S004 NA NA
## F001.S005 NA NA
## F001.S006 NA NA
## scanWindowLowerLimit scanWindowUpperLimit spectrum
## F001.S001 64.00056 652.7924 1
## F001.S002 63.99261 831.1013 2
## F001.S003 62.58697 641.7801 3
## F001.S004 63.77844 873.0706 4
## F001.S005 63.99261 842.8361 5
## F001.S006 63.46862 799.4724 6
class(mz.conditions)
## [1] "data.frame"
dim(mz.conditions)
## [1] 107267 35
head(mz.conditions)
## fileIdx spIdx smoothed seqNum acquisitionNum msLevel polarity
## F001.S001 1 1 NA 1 1 1 0
## F001.S002 1 2 NA 2 3 1 0
## F001.S003 1 3 NA 3 5 1 0
## F001.S004 1 4 NA 4 7 1 0
## F001.S005 1 5 NA 5 9 1 0
## F001.S006 1 6 NA 6 11 1 0
## originalPeaksCount totIonCurrent retentionTime basePeakMZ
## F001.S001 1017 25075350 0.26276 116.9286
## F001.S002 965 37729140 1.41275 116.9286
## F001.S003 1008 24703222 2.44838 116.9286
## F001.S004 1013 36653096 3.44724 116.9286
## F001.S005 935 30202434 4.45024 116.9286
## F001.S006 1046 38604088 5.46111 116.9286
## basePeakIntensity collisionEnergy ionisationEnergy lowMZ highMZ
## F001.S001 2943961 NA 0 64.00056 652.7924
## F001.S002 4164285 NA 0 63.99261 831.1013
## F001.S003 3102831 NA 0 62.58697 641.7801
## F001.S004 4053742 NA 0 63.77844 873.0706
## F001.S005 3491318 NA 0 63.99261 842.8361
## F001.S006 4343667 NA 0 63.46862 799.4724
## precursorScanNum precursorMZ precursorCharge precursorIntensity
## F001.S001 NA NA NA NA
## F001.S002 NA NA NA NA
## F001.S003 NA NA NA NA
## F001.S004 NA NA NA NA
## F001.S005 NA NA NA NA
## F001.S006 NA NA NA NA
## mergedScan mergedResultScanNum mergedResultStartScanNum
## F001.S001 NA NA NA
## F001.S002 NA NA NA
## F001.S003 NA NA NA
## F001.S004 NA NA NA
## F001.S005 NA NA NA
## F001.S006 NA NA NA
## mergedResultEndScanNum injectionTime filterString
## F001.S001 NA 0 <NA>
## F001.S002 NA 0 <NA>
## F001.S003 NA 0 <NA>
## F001.S004 NA 0 <NA>
## F001.S005 NA 0 <NA>
## F001.S006 NA 0 <NA>
## spectrumId centroided
## F001.S001 controllerType=0 controllerNumber=1 scan=1 TRUE
## F001.S002 controllerType=0 controllerNumber=1 scan=3 TRUE
## F001.S003 controllerType=0 controllerNumber=1 scan=5 TRUE
## F001.S004 controllerType=0 controllerNumber=1 scan=7 TRUE
## F001.S005 controllerType=0 controllerNumber=1 scan=9 TRUE
## F001.S006 controllerType=0 controllerNumber=1 scan=11 TRUE
## ionMobilityDriftTime isolationWindowTargetMZ
## F001.S001 NA NA
## F001.S002 NA NA
## F001.S003 NA NA
## F001.S004 NA NA
## F001.S005 NA NA
## F001.S006 NA NA
## isolationWindowLowerOffset isolationWindowUpperOffset
## F001.S001 NA NA
## F001.S002 NA NA
## F001.S003 NA NA
## F001.S004 NA NA
## F001.S005 NA NA
## F001.S006 NA NA
## scanWindowLowerLimit scanWindowUpperLimit spectrum
## F001.S001 64.00056 652.7924 1
## F001.S002 63.99261 831.1013 2
## F001.S003 62.58697 641.7801 3
## F001.S004 63.77844 873.0706 4
## F001.S005 63.99261 842.8361 5
## F001.S006 63.46862 799.4724 6
ms_levels <- table(mz.conditions$msLevel)
head(ms_levels)
##
## 1 2
## 103099 4168
#103099 MS1 and 4168 MS2 (MS/MS)
#Collision energies for fragmentation of pre-cursor ions for MS/MS data
ce <- table(mz.conditions$collisionEnergy)
head(ce)
##
## 30
## 4168
#So 4168 MS/MS spectra were measured with 30eV
#Explore rawData "onDisk" object
#m/z range acquired:
min(raw_data@featureData@data$lowMZ)
## [1] 50.00167
max(raw_data@featureData@data$highMZ)
## [1] 899.9885
#MS-acquisition mode:
table(table(raw_data@featureData@data$msLevel))
##
## 4168 103099
## 1 1
polarities <- table(mz.conditions$polarity)
head(polarities) #EXERCISE 1.5
##
## 0
## 107267
#EXERCISE 2.3: Plot total ion chromatograms for all samples and color-code them according to different fecal collection time points.
Please refer to Figure 1 and associated rationale and code below:
#The chromatogram() method returns a MChromatograms object that organizes individual Chromatogram objects containing the chromatographic data
#in a 2-D array: columns represent samples and rows (optionally) m/z and/or retention time ranges. To create a total ion chromatogram aggregationFun is set to "sum", and for the base peak chromatogram (BPC)
#for each file in our experiment, aggregationFun is set to "max" to return for each spectrum the maximal intensity and hence create the BPC from the raw data. )
TICs <- chromatogram(raw_data, aggregationFun = "sum")
# Plot according to experimental groups of different fecal collection time points
group_colors <- brewer.pal(7, "Dark2")[1:length(levels(pData(raw_data)$class))]
names(group_colors) <- levels(pData(raw_data)$class)
plot(TICs, col = group_colors[raw_data$class], main = "TIC")
#FIGURE 1
#Most peaks are eluted early at < 230 seconds, suggesting need to explore appropriate chromatography column resins
#Due to the complexity of chemical matrices, short chromatographic times, and poor LC/MS chromatographic resolution,
#TIC usually represents an envelop of metabolites overlap. Thus, TICs only allows an overview of the technical measurements
#performance with no inferred biological insights.
#Comparing TIC intensities from samples entering study to identify potential technical outliers:
tc <- split(tic(raw_data), f = fromFile(raw_data))
boxplot(tc, col = group_colors[raw_data$class],ylab = "intensity", names = row.names(pheno),main = "Total ion current")
#FIGURE 2
#Evidently , the initial QC samples 1-9, 22, 29 are outliers with no detected ion current. I will not later include these
#in subset of QC samples for xcms retention time alignment step.
#EXERCISE 2.4:Extract Tryptophan ion chromatogram ([M-H]-) (hint: retention time ~ 2.99 minutes). #Which sample does apparently hold higher levels of Tryptophan?. Plot the extracted ion chromatogram for this sample. #Can you confirm this peak being Tryptophan from MS1 data? Please refer to Figure for the extracted trypophan ion chromatogram.
Using two different approaches, the sample apparently holding higher levels of tryptophan was determined to be either CTR_day28 sample with file index 3 and sample file ID 034_ID_18.mzM(more likely) or CTR_day7 sample with file index 32 and sample file ID 027_ID_2.mzML (highest base peak intensity) I cannot confirm that this peak is tryptophan from MS1 data. Only elemental composition can be inferred. I infer formula for element composition from intact ion but MS2 fragmentation is needed to elucidate and determining molecular structure for ultimate identification with higher-level certainty.Although the precursor m/z of our MS1 spectra matches the m/z of tryptophan, until we examine further as in Exercise 2.5, we can still not exclude that their MS/MS (MS2) spectra represent fragments of ions from different compounds (with the same m/z than tryptophan).Please refer to code below for rationale:
#I extracted XIC for tryptophan [M+H]- considering a 10 ppm error window and retention time ~ 2.99 minutes:
#The following preliminary Centwave parameters ONLINE default parameters were obtained from the url https://xcmsonline.scripps.edu/landing_page.php?pgcontent=mainPage
#and course tutorial.
cw_onlinexcms_default <- CentWaveParam(ppm = 15, peakwidth = c(5, 20),
snthresh = 6, prefilter = c(3, 100),
mzCenterFun = "wMean", integrate = 1L,
mzdiff = -0.001, fitgauss = FALSE,
noise = 0, verboseColumns = FALSE,
roiList = list(), firstBaselineCheck = TRUE,
roiScales = numeric())
#However, as instructed for this exercise, I adhered to the following author parameters obtained from the following scientific publication by authors Pilsner et al:
#Neuroactive metabolites and bile acids are altered in extremely premature infants with brain injury
#Manuel Pristner, Daniel Wasinger, David Seki, Katrin Klebermaß-Schrehof, Angelika Berger, David Berry, Lukas Wisgrill, Benedikt Warth
#doi: https://doi.org/10.1101/2023.05.17.23290088
#Published in Cell Reports Medicine doi: 10.1016/j.xcrm.2024.101480
#Based on authors publication, "all experimental sampling time points from their respective biological matrix were processed together.
#Data pre-processing was done using the R (4.1) package XCMS80 (3.14), starting with the import of the mzXML files.
#The next step was peak detection using the centWave algorithm (parameters: ppm = 5, peakwidth = 5/15; snthresh = 10, prefilter = 3/5000)
#followed by the removing of peaks with a too wide peak width":
cw_authorsxcms_default <- CentWaveParam(ppm = 5, peakwidth = c(5, 15),
snthresh = 10, prefilter = c(3, 5000),
mzCenterFun = "wMean", integrate = 1L,
mzdiff = -0.001, fitgauss = FALSE,
noise = 0, verboseColumns = FALSE,
roiList = list(), firstBaselineCheck = TRUE,
roiScales = numeric())
#Define the rt and m/z range of the tryptophan peak
#Found monoisotopic mass for tryptophan and defined [M+H]- adduct:
#Because of negative ionization mode, the proton mass is substracted from monoisotopic mass
M <- 204.089877638 # monoisotopic molecular weight mass from https://hmdb.ca/metabolites/HMDB0030396
H <- 1.007276
M_H <- M-H
## Define tryptophan peak mz range considering 10 ppm error
error <- 10 #ppm
mmu_min <- M_H-(error*M_H/1e6)
mmu_max <- M_H+(error*M_H/1e6)
mzr <- c(mmu_min, mmu_max)
#Define tryptophan peak rt range width with minute to second unit conversion
apex.tryptophan<-2.99*60
#Using author parameters
rt.window <- cw_authorsxcms_default@peakwidth[2] #max rt width in seconds from course tutorial
rtr <- c(apex.tryptophan-rt.window, apex.tryptophan+rt.window)
raw_data |> filterRt(rt = rtr) |> filterMz(mz = mzr) |> chromatogram(aggregationFun = "max")|>plot()
#Right shoulder of peaks in all that is visible.
#Maybe retention time settings with peakwidth[2] is inappropriate?
#raw_data |> filterRt(rt = rtr) |> filterMz(mz = mzr) |> chromatogram(aggregationFun = "max")|>plot(type = "XIC")
#Error in plot.xy(xy, type, ...) : invalid plot type 'X'
#m/z values of the individual centroids (lower panel) in the plots above would scatter around the real m/z value of the compound
#(in an intensity dependent manner).ROIs are defined based on the difference of m/z values of centroids from consecutive scans (spectra).
#Centroids from consecutive scans are included into a ROI if the difference between their m/z and the mean m/z of the ROI is smaller
#than the user defined ppm parameter. A reasonable choice for the ppm could thus be the maximal m/z difference of data points from
#neighboring scans/spectra that are part of a chromatographic peak for an internal standard of known compound.
#Because the last tryptophan ion chromatogram was not representative, I then used instead default Centwave
#parameters and wider retention time window of +=50 seconds to extract XIC for tryptophan [M+H]- to improve plot
#Getting default parameters
CentWaveParam()
## Object of class: CentWaveParam
## Parameters:
## - ppm: [1] 25
## - peakwidth: [1] 20 50
## - snthresh: [1] 10
## - prefilter: [1] 3 100
## - mzCenterFun: [1] "wMean"
## - integrate: [1] 1
## - mzdiff: [1] -0.001
## - fitgauss: [1] FALSE
## - noise: [1] 0
## - verboseColumns: [1] FALSE
## - roiList: list()
## - firstBaselineCheck: [1] TRUE
## - roiScales: numeric(0)
## - extendLengthMSW: [1] FALSE
## - verboseBetaColumns: [1] FALSE
# Define error in ppm and the mz range
error <- 10 #ppm
mmu_min <- M_H-(error*M_H/1e6)
mmu_max <- M_H+(error*M_H/1e6)
mzr <- c(mmu_min, mmu_max)
#Assume retention time corresponds with apex peak:
apex.tryptophan<-2.99*60
ahigh<-apex.tryptophan+50
alow<-apex.tryptophan-50
ahigh
## [1] 229.4
alow
## [1] 129.4
rt = c(alow, ahigh)
#Extracting and plotting XIC for Tryptophan with max and black and white images
raw_data |> filterRt(rt = c(alow, ahigh)) |> filterMz(mz = mzr) |> chromatogram(aggregationFun = "max")|>plot()
#FIGURE 3
#This chromatogram looks much better. In hindsight, I should have additionally tested author centwave parameters with this
#new retention time window, but I ran out of time and did not.
#Extracting and plotting XIC for Tryptophan again in color-coded images
group_colors <- brewer.pal(7, "Dark2")[1:length(levels(pData(raw_data)$class))]
names(group_colors) <- levels(pData(raw_data)$class)
chr_tryptophan <- chromatogram(raw_data, mz = mzr, rt = c(alow, ahigh))
plot(chr_tryptophan, col = group_colors[chr_tryptophan$class], lwd = 2)
#FIGURE 4
#This chromatogram also looks much better.
#Determining which sample yields XIC for Tryptophan with highest intensity
#Subsetting with filters for retnetion time and mz range
raw_data_sub<-filterRt(raw_data, rt = c(alow, ahigh))
raw_data_sub<-filterMz(raw_data_sub, mz = mzr)
raw_data_chrom<-chromatogram(raw_data_sub,aggregationFun = "max") #RENAME FOR TRYPTOPHAN?
#Iterate through all samples
my_vector = 1:109
filename1<-(fileNames(raw_data_sub))[which.max(sapply(my_vector, function(x) max(intensity(raw_data_chrom[1,x])[!is.na(intensity(raw_data_chrom[1,x]))])))]
filename1
## [1] "C:\\Users\\User\\Desktop\\R\\Metabolomics\\Metabolomics\\MS1\\CTR_day28\\034_ID_18.mzML"
#Sample has file index 3, but we need the SAMPLE ID:
#Print row where filename substring is found in rownames of pheno dataframe
sub1<-str_extract(filename1, "[0-9][0-9][0-9]")
row1<-pheno[str_detect(rownames(pheno), sub1), ]
row1
## [1] CTR_day28
## Levels: CTR_day28 CTR_day7 CTR_termage PAT_day28 PAT_day7 PAT_termage QC
#CTR_day28 sample with file index 3 and sample ID 034_ID_18.mzML has highest intensity for tryptophan
#Plot this single peak using its obtained file index of 3
plot(raw_data_chrom[1,3])
#FIGURE 5
#Altenatively, obtain sample with XIC with highest intensity for tryptophan as follows:
## Using fData to get and return as a data.frame to access beasePeakIntensity variable
eddie<-(fData(raw_data_sub))
max_eddie<-eddie[eddie$basePeakIntensity == max(eddie$basePeakIntensity), ]
#Get file index again:
filename2<-(fileNames(raw_data_sub))[max_eddie$fileIdx]
filename2
## [1] "C:\\Users\\User\\Desktop\\R\\Metabolomics\\Metabolomics\\MS1\\CTR_day7\\027_ID_2.mzML"
#Sample has file index 3, but we need the SAMPLE ID:
#Printing row where filename substring is found in rownames of pheno dataframe
sub2<-gsub(".*([0-9][0-9][0-9][0-9]*_.*mzML).*", "\\1", filename2)
sub22<-substring(sub1, 1, 8)
sub22
## [1] "034"
#Print row where substring is found in rownames of pheno dataframe
#row2<-pheno[str_detect(rownames(pheno), sub22), ]
row2<-pheno[rownames(pheno) %like% sub22, ]
row2
## [1] CTR_day28
## Levels: CTR_day28 CTR_day7 CTR_termage PAT_day28 PAT_day7 PAT_termage QC
#CTR_day7 sample with file index 32 and sample file ID 027_ID_2.mzML has highest intensity for tryptophan
#Plot this single peak using its obtained file index of 32
plot(raw_data_chrom[1,32])
#FIGURE 6
#This appears to be lower in intensity than the first peak.
#Therefore, I assume that the first tryptophan maximum intensity peak with file index=3 is the correct peak.
#EXERCISE 2.5 Read MS/MS spectrum for Tryptophan from samples in MS/MS (acquired in HILIC ESI(-), [M-H]- ) and compare to those #MS/MS spectra tabulated on public libraries. Compute similarity scores and comment on it. Which level of ID would you #be obtaining with these empirical data? Reason your responses. Do the same for uracil and aspartic acid.
In addition to high-scoring matches between my empirical tryptophan MS/MS spectra and those of NIST, MassBank, and HMDB, the authors in their scientific publication “observed metabolic changes at the pathway level Enriched metabolic pathways and their combined p values for (A) plasma, (B) feces, and (C) their intersection.logfold increase change of tryptophan observed in feces termage but not detected in 7 and 28 days”.The Authors also observe “significant alterations between groups, including uracil (ICL = 1, p = 0.041,FC-log2 = 3.4) on day 7 feces. Furthermore, the authors observed”significant negative correlations with the neurotransmitter GABA (ICL = 1) and N-acetyl-L-aspartic acid (ICL = 1), the precursor of the neuronal dipeptide N-acetylaspartylglutamate, and N-acetyl-L-aspartic acid (ICL = 1) was correlated with Enterococcus”. As explained below in greater detail, based on (1) the authors’ observations in their peer-reviewed scientific publication, (2) high MS/MS spectral matches (93.4% similarity score) and similiar intensity distributions between those of my empirical spectra and those from with those of NIST, HMDB, and MassBank (93.4% similarity score), and (3) previously demonstrated correspondence with XIC pre-cursor MS1 ion chromotagram peak, I conclude with level 2 certainty that the previous MS1 pre-cursor chromatogram peak represented elemental composition of tryptophan and that the corresponding MS2 spectra represents all the major MS2 fragments that confirm the structure and identity of tryptophan in the fecal samples. I have lower level of certainty for L-aspartic acid (35% similarity score with MassBank spectra), and even lower level-certainty for uracil due to extremely low similarity scores against MassBank MS spectral library. These low scores are potentially attributed to inappropriate selection of retention time width as explained before, my varied and inconsistent use of different ppm values for the compareSpectra and filterPrecursorMzValues() methods, differences in collision energies and/or Mass Spectrometer instrumentation, batch-effects, procedures, and other conditions. THis can also be attributed to the uncertainty around the negative or positive nature of ionization of acids like L-aspartic acid.Please refer to approach and code writtn below for this comparison of spectra:
#For these 3 metabolites, I will refer to the following confidence Levels of Metabolite identification:
#Confidence levels:
#Level 4: Unknown; spectral data
#Level 3: Putatively characterised compound classes; Spectral data showing similarity to known compounds of a
#chemical class (i.e., reporter ions of polar head of PC, flavonoids ring structure and so on)
#Level 2: Putatively annotated structure; As for levels 3 and 4, including spectral similarity with public or commercial libraries
#Level 1: Identified compounds; a minimum of two orthogonal analytical techniques applied to the analysis
#of both the metabolite of interest and to a chemical reference standard of suspected structural equivalence,
#with all analyses performed under identical analytical conditions (Compare MS/MS and RT to a known compound model)
#Extracting MS/MS spectrum for tryptophan, uracil, and L-aspartic acids+ at collision energy CE= 40 eV which is previously determined via code.
#Then comparing these with spectrum tabulated in NIST subscription-based library and public libraries (MAssBank, HMDB)
#Data import
#Change file path accordingly!!
path_mzXML_MS2 <- "C:/Users/User/Desktop/R/Metabolomics/Metabolomics/MS2"
#Listing all 18 MS/MS .mzXML files (NOTE: NOT mzML!!) contained in my MS2 working directory
mzXML_files_MS2 <- list.files(path_mzXML_MS2, pattern= ".mzXML", recursive=T, full.names = T)
#Create a phenodata dataframe from this working directory structure
pheno_MS2 <- phenoDataFromPaths(mzXML_files_MS2)
#Read open MS data (.mzML files) with readMSData function from the MSnbase package.
raw_data_MS2 <- readMSData(files = mzXML_files_MS2,pdata = new("NAnnotatedDataFrame", pheno_MS2), mode = "onDisk")
#Preliminary data exploration:
#Total MS spectra these files contained and proportions of them as MS and MS/MS spectra:
mz.conditions2 <- MSnbase::fData(raw_data_MS2)
ms_levels2 <- table(mz.conditions2$msLevel)
ms_levels2
##
## 1 2
## 63168 11285
#The polarity used for acquisition
polarities2 <- table(mz.conditions2$polarity)
head(polarities2)
##
## 0
## 74453
#Collision energies for MS/MS data:
ce2 <- table(mz.conditions2$collisionEnergy)
head(ce2)
##
## 40
## 11285
###################TRYPTOPHAN#######################################################################################################
### Although the precursor m/z of our MS1 spectra matches the m/z of tryptophan, we can still not exclude that they represent fragments
#of ions from different compounds (with the same m/z than tryptophan).We find the scan where precursor MZ correspond to TRYPTOPHAN adduct:
#Defining tryptophan parameters
#Filtering on pre-cursor with maximum intensity at collision energy
#Tryptophan monoisotopic mass obtained from HMDB website https://hmdb.ca/metabolites/HMDB0030396
M <- 204.089877638
H <- 1.007276
M_H <- M-H
# Note retention time window is rt.window <- cw_authorsxcms_default@peakwidth[2] from hands-on tutorial, rather than the previously attempted window of +-50
# which visually resulted in appropriate plot of chromatogram. Here, I re-attempted wide window of +-50 but but this did not yield high correlation score here.
apex.tryptophan<-2.99*60
rt.window <- cw_authorsxcms_default@peakwidth[2] #max rt width in seconds
rtr <- c(apex.tryptophan-rt.window, apex.tryptophan+rt.window)
spID1 <- mz.conditions2 |> filter(round(precursorMZ) == 203 & collisionEnergy == 40) |> slice_max(totIonCurrent) |> pull(spIdx)
#Plotting and comparing it FIRST to tabulated tryptophan spectrum in the NIST library
ggplotly(plot(raw_data_MS2[[spID1]]))
#FIGURE 7
#Comparing MS/MS tryptophan spectra with that at the following nist url: https://webbook.nist.gov/cgi/cbook.cgi?ID=C54126&Mask=200
#Pre-cursor ion was 203.08 in mass
#NIST M/Z peaks that were present in my spectra but in lower proportion abundance: 51,77,117,103,204,159,130,142
#NIST M/Z peaks that were absent in my spetra 103
#M/Z peaks that were present in my spectra but not in NIST spectra:180
#Additionally, my MS/MS tryptophan spectra compared well visually with spectra obtained at following hmdb url:https://hmdb.ca/metabolites/HMDB0030396
#Additionally, I used Spectra and AnnotationHub packages to compare against MassBank spectra as follows:
#For features identification.MS2 spectra associated with LC-MS features from the untargeted LC-MS/MS experiment will be matched against andsimilaroty-scored
#with other public spectral reference libraries (i.e.MassBank retrieved using Bioconductor’s AnnotationHub package, MassBank of North America MoNa, Human Metabolom Database HMDB, GNPS)
#This can allows reaching Level 2 ID confidence, i.e, inferring probable structure from features.
#We will use Spectra and AnnotationHub packages for level 2 ID of tryptophan using MS/MS spectral data and the MassBank spectral library.
#Matching experimental spectra (MS/MS) representing tryptophan against the tryptophan entry in MassBank for tryptophan structural confirmation
#If no MySQL database system is available or if this tutorial can not be run within docker, the MassBank data can also be retrieved from Bioconductor's R Biocpkg("AnnotationHub").
#In that case MsBackendCompDb backend (defined in the r Biocpkg("CompoundDb")) is used instead, but both backends retrieve their data from a SQL database and
#have the same properties. Retrieving MassBank spectral data using AnnotationHub package:
ah <- AnnotationHub()
query(ah, "tryptophan")
## AnnotationHub with 0 records
## # snapshotDate(): 2024-04-30
query(ah, "MassBank")
## AnnotationHub with 6 records
## # snapshotDate(): 2024-04-30
## # $dataprovider: MassBank
## # $species: NA
## # $rdataclass: CompDb
## # additional mcols(): taxonomyid, genome, description,
## # coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
## # rdatapath, sourceurl, sourcetype
## # retrieve records with, e.g., 'object[["AH107048"]]'
##
## title
## AH107048 | MassBank CompDb for release 2021.03
## AH107049 | MassBank CompDb for release 2022.06
## AH111334 | MassBank CompDb for release 2022.12.1
## AH116164 | MassBank CompDb for release 2023.06
## AH116165 | MassBank CompDb for release 2023.09
## AH116166 | MassBank CompDb for release 2023.11
#Retrieve the respective release (2023.11 in this case) and access its spectra data with the code below.
#Note that AnnotationHub will cache the downloaded database locally and any subsequent call will not download the database again.
mb <- ah[["AH116166"]]
## loading from cache
mbank <- Spectra(mb)
#dev.off()#to fix Error in plot.new() : figure margins too large
par(mar = c(1, 1, 1, 1))
mbank_sub <- filterPrecursorMzValues(mbank, mz = M_H, ppm = 10)
plotSpectra(mbank_sub, main = mbank_sub$name, labels = function(z) format(mz(z)[[1L]], digits = 6), labelSrt = -30, labelPos = 2, labelOffset = 0.1, ylim = c(0,200))
#FIGURE 8
#Assumed that this is most likely plot formatting related, but just in case, maybe 1L (positive ionization mode polarity)
#be replaced with OL (negative ionization polarity for our experiment?)Calculating the similarity against our empirical spectra
#representing tryptophan using compareSpectra function from Spectra package. Filter empirical data according to m/z and rt of tryptophan ion.
#As compareSpectra result we got the (normalized dot product) similarity score between each tested MassBank spectrum
#(columns) against each experimental spectrum (rows).
#Data Import
#Re-Accessing MS/MS data in OPEN-SOURCE FILE FORMAT mzXML file:
#This seems to work with individual file path, not folder directory path like before for MS1 processing.
#I am therefore randomly using first file I have as an example
mz.file <- "C:/Users/User/Desktop/R/Metabolomics/Metabolomics/MS2/MS2_1/Feces_MS2_neg_1.mzXML"
mz.files <- list.files(path = "C:/Users/User/Desktop/R/Metabolomics/Metabolomics/MS2/MS2_1", full.names = T)
sps_mzfile <- Spectra(mz.file, source = MsBackendMzR())
# Subset sps_mzfile for tryptophan according to precursor mz and rt (252, 257) seconds, recall
# appex for tryptophan is at 254. We also filter out peaks below 30% of the intensity of the base peak
low_int <- function(x, ...) {
x > max(x, na.rm = TRUE) * 0.30
}
#Define a function to *normalize* the intensities
norm_int <- function(x, ...) {
maxint <- max(x[, "intensity"], na.rm = TRUE)
x[, "intensity"] <- 100 * x[, "intensity"] / maxint
x
}
sps <- filterPrecursorMzValues(sps_mzfile, mz = M_H, ppm = 5) |> filterRt(c(apex.tryptophan-rt.window, apex.tryptophan+rt.window)) |> filterIntensity(intensity = low_int) |> addProcessing(norm_int)
res<-compareSpectra(sps, mbank_sub, ppm = 20)
## 'MsBackendCompDb' does not support parallel processing. Switching to serial processing.
res
## 28482 28483 28484 80539 80540 80541 80542
## [1,] 0.2601301 0.5809850 0.6433594 0.3361240 0.9337322 0.4026592 0.8563523
## [2,] 0.2712289 0.5885159 0.6388738 0.3456317 0.9313043 0.4086955 0.8615021
## [3,] 0.3151161 0.5723476 0.7523610 0.3466106 0.7704272 0.3676487 0.7728102
## [4,] 0.3124179 0.5693843 0.7540120 0.3445437 0.7713205 0.3665971 0.7718526
## [5,] 0.1563476 0.2959949 0.4075552 0.2110402 0.4116454 0.1899856 0.4061539
## [6,] 0.3164236 0.5579384 0.7472043 0.3491116 0.7743559 0.3716477 0.7743275
## 80543 80544 80545 80546 80547 80548 80549
## [1,] 0.3935683 0.2705255 0.9438583 0.2932669 0.4190130 0.6974594 0.6309956
## [2,] 0.3998078 0.2667958 0.9398876 0.3045048 0.4295397 0.7061882 0.6413287
## [3,] 0.3567606 0.1921985 0.7898868 0.3540963 0.4250874 0.6489704 0.6049819
## [4,] 0.3556293 0.1931772 0.7909080 0.3514592 0.4228882 0.6471279 0.6027566
## [5,] 0.2145166 0.1052241 0.4233565 0.1771046 0.2160969 0.3368564 0.3119363
## [6,] 0.3605103 0.1951030 0.7906000 0.3560313 0.4285508 0.6505699 0.6067678
## 80550 80551 80552 80553 80554 80555 80556
## [1,] 0.8970065 0.1759173 0.25378615 0.4801076 0.3245288 0.9078604 0.7739032
## [2,] 0.9000748 0.1859038 0.24967425 0.4896842 0.3354908 0.9081569 0.7808572
## [3,] 0.7934221 0.2153601 0.16889847 0.4726654 0.3904994 0.7700048 0.7013363
## [4,] 0.7929191 0.2127464 0.16997389 0.4708365 0.3880490 0.7701547 0.6998718
## [5,] 0.4189874 0.1260159 0.09303822 0.2425186 0.1969600 0.4089245 0.3663108
## [6,] 0.7946589 0.2153459 0.17257566 0.4772438 0.3931735 0.7721316 0.7023746
## 80557 89496 91390 91394 91398 91402 91406 91410 91414 91418 91422
## [1,] 0.3425051 0 0 0 0 0 0 0 0 0 0
## [2,] 0.3522501 0 0 0 0 0 0 0 0 0 0
## [3,] 0.3519985 0 0 0 0 0 0 0 0 0 0
## [4,] 0.3498734 0 0 0 0 0 0 0 0 0 0
## [5,] 0.1778430 0 0 0 0 0 0 0 0 0 0
## [6,] 0.3545077 0 0 0 0 0 0 0 0 0 0
## 91426 91430 91434 107569 112840 112845 112847
## [1,] 0 0 0 0 0.5301556 0.5171368 0.5211996
## [2,] 0 0 0 0 0.5350838 0.5046611 0.5086686
## [3,] 0 0 0 0 0.4774361 0.3409526 0.4384038
## [4,] 0 0 0 0 0.4768729 0.3441184 0.4424745
## [5,] 0 0 0 0 0.2496511 0.2303748 0.2463075
## [6,] 0 0 0 0 0.4835764 0.4251909 0.4494597
#The highest similarity between our spectra and the spectra from MassBank is max(res).
max(res)
## [1] 0.9438583
plotSpectraMirror(sps[2], mbank_sub[1], labels = function(z) format(mz(z)[[1L]], digits = 6),labelSrt = -30, labelPos = 2, labelOffset = 0.1)
#FIGURE 9
#Results shows a cosinus similarity score of 94.3% between any of the two Mass Bank tabulated tryptophan spectra and the empirical spectra sps number two.
#Matched scores (cosine similarty > 80%) are usually considered as potential MS/MS good matches.
#However, closer attention should be paid when interpreting such similarity scores.
#We should consider whether experimental conditions in which the empirical spectra was acquired and those used to record the
#tabulated spectra are comparable. To explore these conditions the spectra metadata can be accessed as follows:
#Consider the spectral variables available
head(spectraVariables(mbank_sub))
## [1] "msLevel" "rtime" "acquisitionNum" "scanIndex"
## [5] "dataStorage" "dataOrigin"
head(spectraVariables(sps))
## [1] "msLevel" "rtime" "acquisitionNum" "scanIndex"
## [5] "dataStorage" "dataOrigin"
#Compare between collisions energy
mbank_sub[2]$collisionEnergy_text
## [1] "20"
mbank_sub[2]$instrument
## [1] "maXis plus UHR-ToF-MS, Bruker Daltonics"
sps[2]$collisionEnergy
## [1] 40
#Despite that the instrument was Q-TOF in both cases, the different vendors and CE used can cause different similarity scores
#Alternatively, subscription payment-based NIST MS/MS entries measured with same instrument and collision energy may in the future show better peaks matching.
#Based on peer-reviewed scientific publication, MS/MS spectal mataches, similar distribution of intensities, correspondence with XIC pre-cursor MS1 ion chromotagram peak,
#I conclude with level 2 certainty that the previous MS1 pre-cursor chromatogram peak represented elemental composition and that the
#corresponding MS2 spectra confirms the structure and identity of tryptophan
#to identify it in my sample with level 2 certainty. I have all the major fragments in my empirical spectra, and these all match those in the MassBank, NIST, HMDB library spectra.
###################################URACIL############################################################################
#URACIL
### Find the scan where precursorMZ correspond to URACIL adduct:
# Defining uracil peak mz values
MU <- 112.027277382 # URACIL monoisotopic mass and MS MS spectra: https://hmdb.ca/metabolites/HMDB0000300
H <- 1.007276
M_H <- M-H
#Retention time from for uracil using Hydrophillic chromatography-Negative ESI Mode for that instrument(did not use xcms online as I have no account):
#Retention time based on following reference doi: 10.1016/j.chroma.2017.08.050
#"Accurate Prediction of Retention in Hydrophilic Interaction Chromatography (HILIC) by Back Calculation of
#High Pressure Liquid Chromatography (HPLC) Gradient Profiles" by Nu Wang1 and Paul G. Boswell
apex.uracil<-2.1969*60
rt.window <- cw_authorsxcms_default@peakwidth[2] #max rt width in seconds
rtr <- c(apex.uracil-rt.window, apex.uracil+rt.window)
spID2 <- mz.conditions2 |> filter(round(precursorMZ) == 111 & collisionEnergy == 40) |> slice_max(totIonCurrent) |> pull(spIdx)
ggplotly(plot(raw_data_MS2[[spID2]]))
#FIGURE 10
#Comparing spectra for uracil with that at the following nist url: https://webbook.nist.gov/cgi/cbook.cgi?ID=C54126&Mask=200
#Pre-cursor ion was 299.26 in mass
#NIST M/Z peaks that were present in my spectra but in lower proportion abundance: 112,69,42
#NIST M/Z peaks that were absent in my spetra 51, 53
#M/Z peaks were present i my spectra but not in NIST spectra:
#Additionally, my MS/MS spectra compared well visually with spectra obtained at following hmdb url:https://hmdb.ca/metabolites/HMDB0000300
#Additionally used Spectra and AnnotationHub packages to compare against MassBank spectra as follows:
par(mar = c(1, 1, 1, 1))
mbank_sub <- filterPrecursorMzValues(mbank, mz = M_H, ppm = 10)
plotSpectra(mbank_sub, main = mbank_sub$name, labels = function(z) format(mz(z)[[1L]], digits = 6), labelSrt = -30, labelPos = 2, labelOffset = 0.1, ylim = c(0,200))
#FIGURE 11
sps <- filterPrecursorMzValues(sps_mzfile, mz = M_H, ppm = 5) |> filterRt(c(apex.uracil-rt.window, apex.uracil+rt.window)) |> filterIntensity(intensity = low_int) |> addProcessing(norm_int)
res<-compareSpectra(sps, mbank_sub, ppm = 20)
## 'MsBackendCompDb' does not support parallel processing. Switching to serial processing.
max(res)
## Warning in max(res): no non-missing arguments to max; returning -Inf
## [1] -Inf
#Results shows a cosinus similarity score of 0 % between any of the two Mass Bank tabulated uracil spectra
#and the empirical spectra sps number two. We now plot the correspondence using plotSpectraMirror function.
#plotSpectraMirror(sps[2], mbank_sub[1], labels = function(z) format(mz(z)[[1L]], digits = 6),labelSrt = -30, labelPos = 2, labelOffset = 0.1)
#Error in h(simpleError(msg, call)) :
# error in evaluating the argument 'x' in selecting a method for function 'plotSpectraMirror': index out of bounds: index has to be between 1 and 0
#FIGURE 12
#Compare
mbank_sub[2]$collisionEnergy_text
## [1] "20"
mbank_sub[2]$instrument
## [1] "maXis plus UHR-ToF-MS, Bruker Daltonics"
#sps[2]$collisionEnergy
#Error in i2index(i, length(x), rownames(x@spectraData)) :
# index out of bounds: index has to be between 1 and 0
#Authors also in their peer-reviewed publication observe "significant alterations between groups, including uracil (ICL = 1, p = 0.041,FC-log2 = 3.4) on day 7 feces
#Despite this, because of the low spectral mataches between my emprical MS/MS spectra and those of NIST, HMDB, and MassBank,
#I cannot conclude with any certainty that I have identified uracil in my samples or that the corresponding MS2 fragmentation spectra confirms the structure and identity of uracil in my sample
#Based on my poor MS/MS spectral match with MassBank spectra, I may require MS1 extracted uracil chromatographic peak assessment and also
#comparison with spectral library where similar collision energy was used, as 10eV is substantially less than my 40eV used
#########################################L-ASPARTIC ACID##########################################################################
#L-ASPARTIC ACID
#Define parameters:
MH <- 133.037507717 # monoisotopic molecular weight according to HMDB https://hmdb.ca/metabolites/HMDB0000191
H <- 1.007276
M_H <- M-H #QUESTION: DO ACIDS MASSES GET DEDUCTED BY PROTON FOR IONIZATION?
#Retention time for L-aspartic acid from for Hydrophillic chromatography-Negative ESI Mode for that instrument based on following reference:
#"An isocratic fluorescence HPLC assay for the monitoring of l-asparaginase activity and l-asparagine depletion in children
#receiving E. colil-asparaginase for the treatment of acute lymphoblastic leukaemia
#Christa E Nath 1, Luciano Dallapozza, Adam E Eslick, Ashish Misra, Deborah Carr, John W Earl
#PMID: 18823071 DOI: 10.1002/bmc.1096
apex.asparticacid<-3.5*60
rt.window <- cw_authorsxcms_default@peakwidth[2] #max rt width in seconds. Is this the correct width to be using? I wish I knew as this affects the outcome
rtr <- c(apex.asparticacid-rt.window, apex.asparticacid+rt.window)
spID3 <- mz.conditions2 |> filter(round(precursorMZ) ==132 & collisionEnergy == 40) |> slice_max(totIonCurrent) |> pull(spIdx)
ggplotly(plot(raw_data_MS2[[spID3]]))
#FIGURE 13
#Comparing spectra for L-aspartic acid with that at the following nist url: #https://webbook.nist.gov/cgi/cbook.cgi?ID=C54126&Mask=200
#Pre-cursor ion was 224.06 in mass
#NIST M/Z peaks were present in my MS/MS spectra but in lower proportion abundance: 90,133,115,70,43,28,18
#NIST M/Z peaks were absent in my MS/MS spectra:NA
#M/Z peaks were present in my spectra but not in NIST spectra:NA
#Additionally, my MS/MS spectra compared well visually with spectra obtained at following hmdb https://hmdb.ca/metabolites/HMDB0000191
#Additionally, I used Spectra and AnnotationHub packages to compare against MassBank MS2 spectra with similarity score as follows:
par(mar = c(1, 1, 1, 1))
mbank_sub <- filterPrecursorMzValues(mbank, mz = M_H, ppm = 10)
plotSpectra(mbank_sub, main = mbank_sub$name,labels = function(z) format(mz(z)[[1L]], digits = 6), labelSrt = -30, labelPos = 2, labelOffset = 0.1, ylim = c(0,200))
#FIGURE 14
sps <- filterPrecursorMzValues(sps_mzfile, mz = M_H, ppm = 5) |> filterRt(c(apex.asparticacid-rt.window, apex.asparticacid+rt.window)) |> filterIntensity(intensity = low_int) |> addProcessing(norm_int)
res<-compareSpectra(sps, mbank_sub, ppm = 20)
## 'MsBackendCompDb' does not support parallel processing. Switching to serial processing.
max(res)
## [1] 0.3577573
#Results shows a cosinus similarity score of 35.78 % between any of the two Mass Bank tabulated L-aspartic acid spectra and the empirical spectra sps number two.
#We now plot the correspondence using plotSpectraMirror function.
#plotSpectraMirror(sps[2], mbank_sub[1], labels = function(z) format(mz(z)[[1L]], digits = 6),labelSrt = -30, labelPos = 2, labelOffset = 0.1)
#ERROR FIGURE 15
#Error in h(simpleError(msg, call)) :
# error in evaluating the argument 'x' in selecting a method for function 'plotSpectraMirror': index out of bounds:
#Compare between collisions energy
mbank_sub[2]$collisionEnergy_text
## [1] "20"
mbank_sub[2]$instrument
## [1] "maXis plus UHR-ToF-MS, Bruker Daltonics"
#I conclude that the corresponding MS2 NIST spectra confirms the structure of L-asparticacid and that with additional
#MS1 pre-cursor ion spectra confirming elemental composiition we may be able conclude with level 2 certainty.However,
# I cannot rely on also a MassBank spectral library as my R-based coding to compare is not compiling or executing properly)
#I conclude nonetheless with Level 2 certainty that I can identify L-aspartic acid apppropriate structure based on MS/MS spectra
#visual comparison with NIST spectral library.
#EXERCISE 3: Run a full XCMS analysis for HILIC negative ionization
mode. Stick to xcms parameters defined by authors.
#EXERCISE 3.1 Explain in your own words each one of the steps in the xcms workflow. Which is the rationale of performing different chromatographies #and using different ionization modes?
Ionization dictates the type of mass spectrum.Different metabolites ionize differently an /or more efficeintly, and this is why both negative and positive ionization modes should be performed for MS. Different metabolites derived from different cellular environments (i.e. hydropphoc membranes vs. soluble proteins in aqueous cytoplasm) will also separate differently from mixtures of other metabolites, and therefore appropriate Higher resolution separation can be achieved by targeting chromatographic technique to more differentiable and unique separation properties of metabolite (i.e. hydrophobicity via Reverse-Phase (RP-HPLC) or hydrophobic interaction (HI)resin, size or molecular weighted via size-exclusion chromatogaphy resin, strong negative charge via anion-exchange chromatography, strong positive charge via cation exchange chromatogrphy, strong affinity via streptavaid or 6x-polyhistidine affinity chromatography resins, etc. The following code below implements the xcms process:
#XCMS
#XCMS STEP1:
#Chromatographic peak detection is performed on extracted ion chromatograms, which helps testing and tuning peak detection settings.
#Running centWave chromatographic peak detection on tryptophan peak using recommended publication authors parameters (not Scripps on-line xcms or default)
#Centwave parameters. Running centWave algorithm on EICs implies different background signal estimation since less data is available.
#Thus, different settings for snthresh are needed (generally a lower snthresh will be used for EICs since the estimated background
#signal tends to be higher for data subsets than for the full data):
#Finding peaks on tryptophan chromatogram
xchr_tryptophan <- findChromPeaks(chr_tryptophan, param = cw_authorsxcms_default)
chromPeaks(xchr_tryptophan)
## rt rtmin rtmax into intb maxo sn row
## [1,] 159.943 153.151 167.538 26415876.0 26280958.6 3709441.8 261 1
## [2,] 159.156 152.517 165.675 62944705.0 62605410.3 9028823.0 216 1
## [3,] 159.986 153.446 166.755 90298632.5 88991661.7 13831585.0 117 1
## [4,] 181.672 166.755 188.196 8700887.2 8331549.2 1577026.0 65 1
## [5,] 159.978 153.281 166.629 87443617.2 86662008.5 11986134.0 146 1
## [6,] 159.360 152.821 166.071 6953116.4 6875406.8 998987.5 85 1
## [7,] 159.524 152.699 166.203 19077213.9 18859836.6 2709901.0 97 1
## [8,] 159.807 153.160 167.331 44631378.7 43907854.3 5918932.5 73 1
## [9,] 159.526 152.849 167.151 23020442.9 22260752.8 3144136.5 43 1
## [10,] 159.492 152.923 166.179 20075328.8 19824715.9 3140314.0 122 1
## [11,] 159.042 152.498 165.535 71005924.4 69902225.2 11135381.0 111 1
## [12,] 180.544 165.535 187.155 8332359.8 7995094.0 1264901.1 58 1
## [13,] 158.957 152.500 166.416 34849302.5 32085485.6 4672532.5 45 1
## [14,] 182.392 166.416 188.792 4188670.5 4111802.1 649968.4 38 1
## [15,] 159.641 152.260 167.191 6747775.4 6608860.4 886687.3 58 1
## [16,] 159.048 152.172 165.629 14976377.7 14778515.1 2244440.5 108 1
## [17,] 159.769 153.138 166.423 36603158.0 35683154.4 5766347.5 56 1
## [18,] 159.552 152.869 166.274 3922370.8 3915611.4 579253.1 296 1
## [19,] 159.400 152.899 166.000 27300424.9 26999348.7 4160065.2 158 1
## [20,] 159.349 152.863 165.960 92280662.6 90503865.6 13318615.0 70 1
## [21,] 179.003 165.960 185.610 12245035.7 11702983.9 2119715.2 111 1
## [22,] 160.260 154.383 166.830 26623538.2 26187284.9 4641643.5 82 1
## [23,] 158.836 152.054 166.420 23794284.9 23282663.5 3364501.5 61 1
## [24,] 159.278 152.795 165.890 73375219.0 72681318.4 11528979.0 143 1
## [25,] 158.679 152.080 166.215 17603197.9 17226059.1 2368312.0 82 1
## [26,] 159.534 152.062 166.137 55725071.0 54026497.0 7969349.5 55 1
## [27,] 180.283 166.137 187.933 8158289.1 7822010.9 1195368.8 67 1
## [28,] 160.546 150.110 170.098 21214636.6 20704633.7 2958584.2 69 1
## [29,] 158.969 152.342 165.621 17492038.4 17259592.4 2773273.5 122 1
## [30,] 159.683 152.112 167.339 22441856.1 22256121.6 2738155.2 136 1
## [31,] 158.704 152.202 166.223 81452505.3 78663584.1 12012499.0 55 1
## [32,] 181.406 166.223 188.035 12033007.9 11658427.6 1683674.2 75 1
## [33,] 159.847 153.071 166.693 4867127.2 4821358.9 676316.8 83 1
## [34,] 160.213 153.579 167.915 9718017.4 9660224.2 1303169.6 171 1
## [35,] 158.878 152.176 165.422 18476894.7 18278366.5 2860269.8 130 1
## [36,] 160.370 156.678 164.976 719283.2 719274.9 196172.6 196172 1
## [37,] 153.896 150.952 156.678 346457.8 346452.1 118695.8 118695 1
## [38,] 158.913 152.262 166.531 26869680.4 26683078.5 3791122.2 151 1
## [39,] 159.130 152.640 166.505 2108709.7 2067169.2 269349.4 48 1
## [40,] 160.266 152.802 169.802 23973238.9 23780234.9 3343299.8 175 1
## [41,] 158.970 152.212 167.531 2840906.2 2835092.5 386196.1 188 1
## [42,] 160.072 153.535 167.673 30986816.0 30540816.6 4468050.5 124 1
## [43,] 159.112 152.179 165.906 753368.3 744980.0 121287.5 61 1
## [44,] 158.943 152.322 166.571 6400112.8 6256893.3 975661.4 85 1
## [45,] 159.479 153.625 167.183 1655539.5 1646549.6 267526.7 102 1
## [46,] 159.801 153.156 165.594 8259341.4 8104316.2 1415972.9 116 1
## [47,] 159.183 152.752 165.865 17213551.5 16847751.4 2757403.5 160 1
## [48,] 159.445 152.764 166.023 19810873.9 19595803.1 2905580.0 111 1
## [49,] 160.118 153.542 167.710 13990839.7 13921982.2 1764214.1 241 1
## [50,] 160.064 153.159 167.673 1926395.6 1913177.5 229596.1 60 1
## [51,] 158.637 150.644 169.269 8859665.1 8253730.2 1649286.6 55 1
## [52,] 159.070 152.552 166.735 2510481.0 2496432.8 315736.9 103 1
## [53,] 159.659 153.244 166.164 15169394.8 14839777.4 2223130.0 111 1
## [54,] 159.378 152.662 166.070 15838430.7 15583202.0 2246020.2 127 1
## [55,] 160.077 152.526 166.817 93169918.2 91984518.5 13803591.0 124 1
## [56,] 159.410 152.757 167.132 28891960.1 28755657.6 3731553.8 194 1
## [57,] 160.151 153.355 167.825 13010606.7 12876393.0 1899735.1 140 1
## [58,] 158.425 151.803 165.903 11011659.0 10790054.0 1633202.1 74 1
## [59,] 159.437 152.732 167.104 16646087.8 16329887.5 2354766.5 61 1
## [60,] 159.243 152.737 166.772 55041402.3 54238608.8 7839548.0 103 1
## [61,] 158.919 152.280 165.576 35997360.8 35337760.4 5114763.5 92 1
## [62,] 159.315 152.647 165.934 18204853.2 17962938.7 2811945.2 112 1
## [63,] 159.298 152.636 166.003 24227496.6 23746510.3 3706322.0 90 1
## [64,] 158.786 151.239 165.362 10552836.2 10459975.7 1564282.8 123 1
## [65,] 159.055 153.154 165.702 963728.1 963714.6 143246.9 143246 1
## [66,] 161.027 153.404 167.740 7309637.9 7287133.7 1044485.5 261 1
## [67,] 159.235 153.603 166.743 7592858.9 7530074.0 1097985.4 128 1
## [68,] 159.145 151.655 165.614 9333766.2 9257222.4 1415297.6 135 1
## [69,] 159.520 152.206 165.929 23039601.3 22769051.7 3348875.5 124 1
## [70,] 180.177 165.929 184.816 2538717.7 2538697.8 529226.4 99966 1
## [71,] 159.823 153.168 166.882 2085654.5 1638093.7 287294.6 12 1
## [72,] 158.804 152.301 165.146 33898859.4 33397116.4 5001331.0 82 1
## [73,] 180.895 169.705 186.522 2716150.8 2716133.9 403777.4 187578 1
## [74,] 158.690 152.186 166.102 30100452.0 29588062.2 4057441.8 75 1
## [75,] 158.271 150.805 165.735 53442722.9 52993673.0 7217708.5 173 1
## [76,] 159.742 152.969 166.407 5911290.1 5666865.4 914279.9 45 1
## [77,] 160.344 152.877 167.082 9106912.6 8954486.6 1149583.4 56 1
## [78,] 159.263 152.848 166.768 22719291.3 22000030.0 3309146.0 61 1
## [79,] 159.840 153.264 166.567 4488179.8 4320979.2 667982.4 45 1
## [80,] 159.577 152.844 167.194 3253826.0 3194021.1 545109.5 61 1
## [81,] 158.802 152.349 166.394 57445092.7 56711216.5 8059387.0 127 1
## [82,] 159.091 151.477 165.778 5712465.5 5600840.1 801146.9 92 1
## [83,] 159.049 152.555 166.665 93932526.4 92476944.6 13238220.0 95 1
## [84,] 159.941 153.054 166.853 2142274.7 2140577.9 353420.9 355 1
## [85,] 159.386 152.643 166.320 1763494.6 1760288.5 274000.3 192 1
## [86,] 159.313 151.093 169.101 3094884.6 3091793.4 362644.4 414 1
## [87,] 160.525 152.539 170.532 2389023.8 2381134.7 275652.2 157 1
## [88,] 161.118 152.169 173.037 2470813.1 2446241.2 320738.2 109 1
## [89,] 159.745 153.895 170.685 3259273.7 3224361.4 376797.5 91 1
## [90,] 160.198 153.472 170.143 3917166.8 3917149.2 521550.8 487010 1
## [91,] 160.526 153.626 168.289 3760899.6 3760884.0 483944.7 483944 1
## [92,] 159.705 152.929 167.385 4124387.0 4088396.7 585741.8 131 1
## [93,] 160.483 153.070 166.946 25685501.9 23724936.9 3767790.5 37 1
## [94,] 161.287 153.792 167.798 24489124.9 24213991.0 3492037.8 115 1
## [95,] 160.887 153.471 167.583 25892254.4 25777501.1 3645357.2 387 1
## [96,] 159.981 152.572 167.431 25036145.1 24718131.1 3548047.5 120 1
## [97,] 159.914 150.587 169.171 24522144.6 24116903.9 3376330.8 118 1
## [98,] 158.972 152.512 166.361 23684956.7 23193164.4 3408130.8 76 1
## [99,] 159.899 149.684 169.130 24509083.6 24086197.8 3073721.5 104 1
## [100,] 160.436 152.991 167.098 23734787.3 23615887.2 3273927.0 281 1
## [101,] 159.252 152.592 165.693 24449753.7 24293631.9 3517919.8 219 1
## [102,] 159.874 152.430 166.419 21587369.0 20936126.4 3057596.5 53 1
## [103,] 158.562 152.930 160.426 4639544.5 4590319.3 1498079.8 159 1
## [104,] 159.561 152.131 166.994 23707741.0 23406642.9 3222981.5 96 1
## [105,] 160.003 150.861 169.066 1777843.4 1751181.6 215090.7 55 1
## [106,] 159.038 152.513 166.489 21622904.0 21141274.8 3017812.0 65 1
## [107,] 159.833 150.392 169.230 24125579.4 23606651.2 3117054.0 100 1
## [108,] 159.541 152.041 167.293 4935142.2 4794773.8 641919.9 54 1
## [109,] 158.956 151.372 170.253 16741040.5 16105906.1 2504247.2 54 1
## [110,] 159.434 152.966 165.957 21252553.4 20820611.2 3141830.0 70 1
## [111,] 159.365 152.689 166.869 21967069.3 21710590.0 2907608.8 130 1
## [112,] 160.551 152.633 165.780 1602336.4 1599585.6 216269.7 287 1
## [113,] 160.390 153.894 166.944 20036940.0 19861144.0 2846487.2 116 1
## [114,] 159.822 153.172 166.452 18236601.9 17744526.0 3163896.0 61 1
## [115,] 159.753 153.093 165.290 17989053.7 17622457.8 3130340.5 66 1
## [116,] 160.201 152.639 167.593 22194231.7 22041848.6 2769894.5 243 1
## [117,] 159.883 153.359 167.305 22298583.4 22191762.4 2831927.0 219 1
## column
## [1,] 1
## [2,] 2
## [3,] 3
## [4,] 3
## [5,] 4
## [6,] 5
## [7,] 6
## [8,] 7
## [9,] 9
## [10,] 10
## [11,] 11
## [12,] 11
## [13,] 12
## [14,] 12
## [15,] 13
## [16,] 14
## [17,] 15
## [18,] 16
## [19,] 17
## [20,] 18
## [21,] 18
## [22,] 19
## [23,] 20
## [24,] 21
## [25,] 22
## [26,] 23
## [27,] 23
## [28,] 24
## [29,] 25
## [30,] 26
## [31,] 27
## [32,] 27
## [33,] 28
## [34,] 29
## [35,] 30
## [36,] 31
## [37,] 31
## [38,] 32
## [39,] 33
## [40,] 34
## [41,] 35
## [42,] 36
## [43,] 37
## [44,] 38
## [45,] 39
## [46,] 40
## [47,] 41
## [48,] 42
## [49,] 43
## [50,] 44
## [51,] 45
## [52,] 46
## [53,] 47
## [54,] 48
## [55,] 49
## [56,] 50
## [57,] 51
## [58,] 52
## [59,] 53
## [60,] 54
## [61,] 55
## [62,] 56
## [63,] 57
## [64,] 58
## [65,] 59
## [66,] 60
## [67,] 61
## [68,] 62
## [69,] 63
## [70,] 63
## [71,] 64
## [72,] 65
## [73,] 65
## [74,] 66
## [75,] 67
## [76,] 68
## [77,] 69
## [78,] 70
## [79,] 71
## [80,] 72
## [81,] 73
## [82,] 74
## [83,] 75
## [84,] 76
## [85,] 77
## [86,] 78
## [87,] 79
## [88,] 80
## [89,] 81
## [90,] 82
## [91,] 83
## [92,] 84
## [93,] 85
## [94,] 86
## [95,] 87
## [96,] 88
## [97,] 89
## [98,] 90
## [99,] 91
## [100,] 92
## [101,] 93
## [102,] 94
## [103,] 95
## [104,] 96
## [105,] 97
## [106,] 98
## [107,] 99
## [108,] 100
## [109,] 101
## [110,] 102
## [111,] 103
## [112,] 104
## [113,] 105
## [114,] 106
## [115,] 107
## [116,] 108
## [117,] 109
#Plotting detected peaks
sample_colors <- group_colors[xchr_tryptophan$class]
plot(xchr_tryptophan, col = sample_colors, peakBg = sample_colors[chromPeaks(xchr_tryptophan)[, "column"]])
#FIGURE 16
#Nice looking green/red, uniform and symmetrical peaks
#Performing chromatographic peak detection in the whole demo dataset using optimized peak detection settings from authors (NOT ONLINE xcms or default).
#Setting snthreh = 1000 to speed-up the entire data analysis process:
cw_authorsxcms_default <- CentWaveParam(ppm = 5, peakwidth = c(5, 15),
snthresh = 1000, prefilter = c(3, 5000),
mzCenterFun = "wMean", integrate = 1L,
mzdiff = -0.001, fitgauss = FALSE,
noise = 0, verboseColumns = FALSE,
roiList = list(), firstBaselineCheck = TRUE,
roiScales = numeric())
xdata <- findChromPeaks(raw_data, param = cw_authorsxcms_default)
##
## This is xcms version 4.2.1
##
##
## Attaching package: 'xcms'
##
## The following object is masked from 'package:stats':
##
## sigma
##
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 16934 regions of interest ... OK: 6675 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 17267 regions of interest ... OK: 4814 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 17494 regions of interest ... OK: 5214 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 17653 regions of interest ... OK: 5112 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 17002 regions of interest ... OK: 5166 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 16032 regions of interest ... OK: 5785 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 16568 regions of interest ... OK: 6312 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 13121 regions of interest ... OK: 7758 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 16486 regions of interest ... OK: 6116 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 17140 regions of interest ... OK: 4115 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 17662 regions of interest ... OK: 5606 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 17829 regions of interest ... OK: 5486 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 16298 regions of interest ... OK: 6180 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 16581 regions of interest ... OK: 5330 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 17436 regions of interest ... OK: 6391 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 19673 regions of interest ... OK: 5166 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 16178 regions of interest ... OK: 5546 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 18513 regions of interest ... OK: 6034 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 17672 regions of interest ... OK: 4804 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 16984 regions of interest ... OK: 5136 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 17094 regions of interest ... OK: 4764 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 17745 regions of interest ... OK: 5780 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 16666 regions of interest ... OK: 4984 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 16436 regions of interest ... OK: 5946 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 16193 regions of interest ... OK: 4994 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 15766 regions of interest ... OK: 5427 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 16473 regions of interest ... OK: 4546 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 15933 regions of interest ... OK: 3552 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 15644 regions of interest ... OK: 4655 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 17178 regions of interest ... OK: 3859 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 17287 regions of interest ... OK: 5496 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 18229 regions of interest ... OK: 5497 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 14897 regions of interest ... OK: 6124 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 20141 regions of interest ... OK: 4486 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 15194 regions of interest ... OK: 4535 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 18998 regions of interest ... OK: 6435 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 16119 regions of interest ... OK: 6098 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 21486 regions of interest ... OK: 5084 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 19235 regions of interest ... OK: 5156 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 19972 regions of interest ... OK: 5472 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 16578 regions of interest ... OK: 6119 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 16518 regions of interest ... OK: 5130 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 15819 regions of interest ... OK: 4567 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 21079 regions of interest ... OK: 5844 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 17683 regions of interest ... OK: 3853 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 16064 regions of interest ... OK: 4747 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 17718 regions of interest ... OK: 5895 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 17322 regions of interest ... OK: 5018 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 17320 regions of interest ... OK: 5383 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 17939 regions of interest ... OK: 6122 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 16819 regions of interest ... OK: 4334 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 16145 regions of interest ... OK: 5517 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 16230 regions of interest ... OK: 6218 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 17360 regions of interest ... OK: 4890 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 17205 regions of interest ... OK: 5393 found.
##
## This is xcms version 4.2.1
##
##
## Attaching package: 'xcms'
##
## The following object is masked from 'package:stats':
##
## sigma
##
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 16310 regions of interest ... OK: 4954 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 16653 regions of interest ... OK: 5012 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 15751 regions of interest ... OK: 4005 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 13676 regions of interest ... OK: 5822 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 16794 regions of interest ... OK: 6262 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 17186 regions of interest ... OK: 4861 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 16441 regions of interest ... OK: 5129 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 16069 regions of interest ... OK: 6325 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 16542 regions of interest ... OK: 5344 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 16366 regions of interest ... OK: 6103 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 15231 regions of interest ... OK: 5415 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 17190 regions of interest ... OK: 3485 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 19038 regions of interest ... OK: 5610 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 16387 regions of interest ... OK: 5461 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 18340 regions of interest ... OK: 4529 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 21949 regions of interest ... OK: 5333 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 20193 regions of interest ... OK: 6420 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 22247 regions of interest ... OK: 5828 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 16327 regions of interest ... OK: 4426 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 16726 regions of interest ... OK: 4938 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 18143 regions of interest ... OK: 755 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 22094 regions of interest ... OK: 720 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 12160 regions of interest ... OK: 5288 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 12304 regions of interest ... OK: 5263 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 12433 regions of interest ... OK: 5295 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 12595 regions of interest ... OK: 4741 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 12708 regions of interest ... OK: 5353 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 12976 regions of interest ... OK: 6074 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 13860 regions of interest ... OK: 5523 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 17648 regions of interest ... OK: 5757 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 17636 regions of interest ... OK: 5841 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 17667 regions of interest ... OK: 6401 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 17998 regions of interest ... OK: 5974 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 18068 regions of interest ... OK: 5314 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 18012 regions of interest ... OK: 5470 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 17340 regions of interest ... OK: 5970 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 17551 regions of interest ... OK: 6999 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 17580 regions of interest ... OK: 5686 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 17505 regions of interest ... OK: 5592 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 17436 regions of interest ... OK: 5437 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 18735 regions of interest ... OK: 4915 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 16843 regions of interest ... OK: 5415 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 17825 regions of interest ... OK: 5347 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 17760 regions of interest ... OK: 5766 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 16466 regions of interest ... OK: 5435 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 17415 regions of interest ... OK: 6414 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 17262 regions of interest ... OK: 4967 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 17042 regions of interest ... OK: 5655 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 12978 regions of interest ... OK: 3504 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 17698 regions of interest ... OK: 4623 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 17879 regions of interest ... OK: 5664 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 18253 regions of interest ... OK: 5055 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 18154 regions of interest ... OK: 6039 found.
## Detecting mass traces at 5 ppm ... OK
## Detecting chromatographic peaks in 18512 regions of interest ... OK: 5429 found.
#XCMS STEP 2: ALIGNMENT
#The time at which analytes elute from the chromatographic column (rt) can vary between samples (and even compounds).
#The alignment step (retention time correction) adjusts for these differences by shifting signals
#along the retention time axis and aligning them between different samples within an experiment
#Aim: adjust shifts in retention times between samples.
#Function: adjustRtime.
#Available methods:
#obiwarp (ObiwarpParam) (Prince and Marcotte 2006): warps the (full) data to a reference sample.
#peakGroups (PeakGroupsParam) (Smith et al. 2006): align spectra from different samples based on hook peaks.
#Need to define the hook peaks first: peaks present in most/all samples.
#We here align samples using obiwarp with authors (not online) xcms parameters
#According to authors' cited publication: "The retention time between samples were aligned employing the obiwarp algorithm
#(parameters: binSize = 0.6, distFun = "cor_opt") with a previous alignment of pooled QC samples followed by the alignment of the samples".
#Defining author-specified settings for the alignment (initially with no subset of QC samples):
#xdata_adj <- adjustRtime(xdata, param = ObiwarpParam(binSize = 0.6, distFun = "cor_opt")
#Executing this code unfortunately generated the following error using default system centerSample=55
#Stop worker failed with the error: wrong args for environment subassignment
#Error: BiocParallel errors
#1 remote errors, element index: 23
#85 unevaluated and other errors
#first remote error:
# Error in socketConnection(port = port, server = TRUE, blocking = TRUE, : cannot open the connection
# In addition: Warning messages:1: In serialize(data, node$con) :'package:sta 3: In serialize(data, node$con) :'package:stats' may not be available when loading
# may not be available when loading 2: In serialize(data, node$con) : 'package:stats' may not be available when loading
#Therefore, I used alternative SUBSET-BASED alignment that reflects the authors' mention of alignment using pooled QC samples initially instead:
#I specify a subset of QC samples that based on boxplot of total ion current looked OK and not as outliers, as well as use subset adjust=previous as
#authors mention QC samples adjustment preceded those of regular samples:
#For some experiments it might be better to perform the alignment based on only a subset of the available samples,
#e.g. if pooled QC samples were injected at regular intervals or if the experiment contains blanks.
#All alignment methods in xcms support such a subset-based alignment in which retention time shifts are estimated on
#only a specified subset of samples followed by an alignment of the whole data set based on the aligned subset.
#The subset of samples for such an alignment can be specified with the parameter subset of the PeakGroupsParam or ObiwarpParam object
#Parameter subsetAdjust allows to specify the method by which the left-out samples will be adjusted. There are currently two options available:
#subsetAdjust = "previous": adjust the retention times of a non-subset sample based on the alignment results of the previous subset sample (e.g. a QC sample).
#This approach requires a meaningful/correct ordering of the samples within the object (i.e., samples should be ordered by injection index).
xdata_adj <- adjustRtime(xdata, param = ObiwarpParam(binSize = 0.6, distFun = "cor_opt", subset=c(91,92,93,94,95), subsetAdjust="previous"))
## Sample number 3 used as center sample.
## Aligning 041_QC.mzML against 052_QC.mzML ... OK
## Aligning 047_QC.mzML against 052_QC.mzML ... OK
## Aligning 056_QC.mzML against 052_QC.mzML ... OK
## Aligning 062_QC.mzML against 052_QC.mzML ... OK
## Aligning sample number 1 against subset ... OK
## Aligning sample number 2 against subset ... OK
## Aligning sample number 3 against subset ... OK
## Aligning sample number 4 against subset ... OK
## Aligning sample number 5 against subset ... OK
## Aligning sample number 6 against subset ... OK
## Aligning sample number 7 against subset ... OK
## Aligning sample number 8 against subset ... OK
## Aligning sample number 9 against subset ... OK
## Aligning sample number 10 against subset ... OK
## Aligning sample number 11 against subset ... OK
## Aligning sample number 12 against subset ... OK
## Aligning sample number 13 against subset ... OK
## Aligning sample number 14 against subset ... OK
## Aligning sample number 15 against subset ... OK
## Aligning sample number 16 against subset ... OK
## Aligning sample number 17 against subset ... OK
## Aligning sample number 18 against subset ... OK
## Aligning sample number 19 against subset ... OK
## Aligning sample number 20 against subset ... OK
## Aligning sample number 21 against subset ... OK
## Aligning sample number 22 against subset ... OK
## Aligning sample number 23 against subset ... OK
## Aligning sample number 24 against subset ... OK
## Aligning sample number 25 against subset ... OK
## Aligning sample number 26 against subset ... OK
## Aligning sample number 27 against subset ... OK
## Aligning sample number 28 against subset ... OK
## Aligning sample number 29 against subset ... OK
## Aligning sample number 30 against subset ... OK
## Aligning sample number 31 against subset ... OK
## Aligning sample number 32 against subset ... OK
## Aligning sample number 33 against subset ... OK
## Aligning sample number 34 against subset ... OK
## Aligning sample number 35 against subset ... OK
## Aligning sample number 36 against subset ... OK
## Aligning sample number 37 against subset ... OK
## Aligning sample number 38 against subset ... OK
## Aligning sample number 39 against subset ... OK
## Aligning sample number 40 against subset ... OK
## Aligning sample number 41 against subset ... OK
## Aligning sample number 42 against subset ... OK
## Aligning sample number 43 against subset ... OK
## Aligning sample number 44 against subset ... OK
## Aligning sample number 45 against subset ... OK
## Aligning sample number 46 against subset ... OK
## Aligning sample number 47 against subset ... OK
## Aligning sample number 48 against subset ... OK
## Aligning sample number 49 against subset ... OK
## Aligning sample number 50 against subset ... OK
## Aligning sample number 51 against subset ... OK
## Aligning sample number 52 against subset ... OK
## Aligning sample number 53 against subset ... OK
## Aligning sample number 54 against subset ... OK
## Aligning sample number 55 against subset ... OK
## Aligning sample number 56 against subset ... OK
## Aligning sample number 57 against subset ... OK
## Aligning sample number 58 against subset ... OK
## Aligning sample number 59 against subset ... OK
## Aligning sample number 60 against subset ... OK
## Aligning sample number 61 against subset ... OK
## Aligning sample number 62 against subset ... OK
## Aligning sample number 63 against subset ... OK
## Aligning sample number 64 against subset ... OK
## Aligning sample number 65 against subset ... OK
## Aligning sample number 66 against subset ... OK
## Aligning sample number 67 against subset ... OK
## Aligning sample number 68 against subset ... OK
## Aligning sample number 69 against subset ... OK
## Aligning sample number 70 against subset ... OK
## Aligning sample number 71 against subset ... OK
## Aligning sample number 72 against subset ... OK
## Aligning sample number 73 against subset ... OK
## Aligning sample number 74 against subset ... OK
## Aligning sample number 75 against subset ... OK
## Aligning sample number 76 against subset ... OK
## Aligning sample number 77 against subset ... OK
## Aligning sample number 78 against subset ... OK
## Aligning sample number 79 against subset ... OK
## Aligning sample number 80 against subset ... OK
## Aligning sample number 81 against subset ... OK
## Aligning sample number 82 against subset ... OK
## Aligning sample number 83 against subset ... OK
## Aligning sample number 84 against subset ... OK
## Aligning sample number 85 against subset ... OK
## Aligning sample number 86 against subset ... OK
## Aligning sample number 87 against subset ... OK
## Aligning sample number 88 against subset ... OK
## Aligning sample number 89 against subset ... OK
## Aligning sample number 90 against subset ... OK
## Aligning sample number 96 against subset ... OK
## Aligning sample number 97 against subset ... OK
## Aligning sample number 98 against subset ... OK
## Aligning sample number 99 against subset ... OK
## Aligning sample number 100 against subset ... OK
## Aligning sample number 101 against subset ... OK
## Aligning sample number 102 against subset ... OK
## Aligning sample number 103 against subset ... OK
## Aligning sample number 104 against subset ... OK
## Aligning sample number 105 against subset ... OK
## Aligning sample number 106 against subset ... OK
## Aligning sample number 107 against subset ... OK
## Aligning sample number 108 against subset ... OK
## Aligning sample number 109 against subset ... OK
## Apply retention time correction performed on MS1 to spectra from all MS levels
## Applying retention time adjustment to the identified chromatographic peaks ... OK
#This successfully adjusted retention time. I explore the impact of this adjustment. Inspecting difference between raw and adjusted retention times.
plotAdjustedRtime(xdata_adj, col = group_colors[xdata$class])
#FIGURE 17
#Evaluate impact of the alignment on previously extracted tryptophan peak. Plotting XIC for tryptophan before and after alignment:
#Implement the retention time alignment when missalignments across samples are over ~ 20-30 seconds which is the accepted peak width
#for LC chromatographic peaks. Again, recalling values for rtr and mzr for filtering tryptophan peak:
M <- 204.089877638
H <- 1.007276
M_H <- M-H
## Define tryptophan peak mz range considering 10 ppm error
error <- 10 #ppm
mmu_min <- M_H-(error*M_H/1e6)
mmu_max <- M_H+(error*M_H/1e6)
mzr <- c(mmu_min, mmu_max)
#Define tryptophan peak rt range width with minute to second unit conversion
apex.tryptophan<-2.99*60
rt.window <- cw_authorsxcms_default@peakwidth[2] #max rt width in seconds
rtr <- c(apex.tryptophan-rt.window, apex.tryptophan+rt.window)
#Using adjustedRtime parameter to access raw/adjusted retention times
par(mfrow = c(1, 2), mar = c(4, 4.5, 0.9, 0.5))
plot(chromatogram(xdata, mz = mzr,rt = rtr, adjustedRtime = FALSE), col = group_colors[xdata$class], peakType = "none", lwd = 2,main = "Before alignment")
plot(chromatogram(xdata_adj, mz = mzr, rt = rtr),col = group_colors[xdata$class], peakType = "none", lwd = 2,main = "After alignment")
#FIGURE 18
#Evidently, this did not adjust enough to center the peak.
#We now try again with different retention time rt window previously used for generating nice tryptophan XIC:
apex<-2.99*60
ahigh<-apex+50
alow<-apex-50
ahigh
## [1] 229.4
alow
## [1] 129.4
rt = c(alow, ahigh)
par(mfrow = c(1, 2), mar = c(4, 4.5, 0.9, 0.5))
plot(chromatogram(xdata, mz = mzr,rt = rt, adjustedRtime = FALSE), col = group_colors[xdata$class], peakType = "none", lwd = 2, main = "Before alignment")
plot(chromatogram(xdata_adj, mz = mzr, rt = rt), col = group_colors[xdata$class], peakType = "none", lwd = 2, main = "After alignment")
#FIGURE 19
#This did not result in significant difference in envelope.
#I will not keep these retention time adjustment settings:
#I drop these obiwarp alignment adjustments and restore original retention times as follows:
xdata_adj <- dropAdjustedRtime(xdata_adj)
## Reverting retention times of identified peaks to original values ... OK
hasAdjustedRtime(xdata_adj)
## [1] FALSE
#XCMS Step 3: xcms Correspondence
#Correspondence is usually the final step in LC-MS data pre-processing in which data, presumably representing signal
#from the same originating ions, is matched across samples
#Aim: group signal (peaks) from the same ion across samples.
#Function: groupChromPeaks.
#Methods available:
#peak density (PeakDensityParam) (Smith et al. 2006).
#nearest (NearestPeaksParam)
#Iterate through slices along m/z. Within these small slices along the m/z dimension, the algorithm combines
#chromatographic peaks depending on the density of these peaks along the retention time axis: i.e.,
#it compares retention times of peaks within each slice and group peaks if they are close.Distribution of peaks along retention time axis
#is used to define those peaks to group. plotChromPeakDensity: plot distribution of identified peaks along rt for a given m/z slice;
#simulates correspondence analysis.
#Peak density parameters:
#bw is the most important parameter. It defines the smoothness of the density function
#binSize: m/z width of the data slice in which peaks are grouped. It should be small enough to avoid peaks
#from different ions measured at similar retention times to be grouped together
#maxFeatures: maximum number of features to be defined in one bin.
#minFraction: minimum proportion of samples (of one group!) for which a peak has to be present.
#minSamples: minimum number of samples a peak has to be present.
#Set on-line xcms parameters for peak density using PeakDensityParam
#According to cited publication's authors: The detected peaks were grouped to features with a minimum threshold
#of 40% in at least one experimental group (parameters: bw = 1, minFraction = 0.4, minSamples = 5).
#Features missing a peak in certain samples were integrated in the respective retention time width of the feature. The resulting
#feature list was further processed by the R package CAMERA (1.48)81 (parameters: perfwhm = 0.6, ppm = 5),
#creating adduct and isotopologue annotations. '
#Setting authors' (not xcms online) xcms parameters for peak density using PeakDensityParam:
## Set on-line xcms default parameters for peak density.
authorsxcms_pdp <- PeakDensityParam(sampleGroups = xdata$class,bw = 1,minFraction = 0.4,binSize = 0.015,minSamples = 5,maxFeatures = 100)
#Adjusting peak density parameters. Plotting data for the m/z slice containing the tryptophan peak.
#Using plotChromPeakDensity to simulate a correspondence analysis in the same slice using on-line xcms parameters for peak density.
#Determining if they fit to tryptophan:
## Extract a BPC for an m/z slice containing tryptophan
par(mfrow = c(3, 1), mar = c(4, 4.3, 1, 0.5))
plot(chromatogram(xdata, mz = mzr, aggregationFun = "max"))
#FIGURE 20
highlightChromPeaks(xdata, mz = mzr, whichPeaks = "apex_within")
## Dry-run correspondence and show the results.
plotChromPeakDensity(xdata, mz = mzr, type = "apex_within", param = authorsxcms_pdp, main = "bw = 1")
## Warning in .local(object, ...): Use of 'plotChromPeakDensity' on 'XCMSnExp'
## isdiscouraged. Please extract chromatographic data first and call
## 'plotChromPeakDensity' directly on the 'XChromatograms' object. See
## ?XChromatograms, section 'Correspondence analysis' for more details.
#FIGURE 21
## Increase bw to 20
authorsxcms_pdp2 <- PeakDensityParam(sampleGroups = xdata$class,bw = 20,minFraction = 0.4,binSize = 0.015,minSamples = 5,maxFeatures = 100)
plotChromPeakDensity(xdata, mz = mzr, type = "apex_within", param = authorsxcms_pdp2, main = "bw = 20")
## Warning in .local(object, ...): Use of 'plotChromPeakDensity' on 'XCMSnExp'
## isdiscouraged. Please extract chromatographic data first and call
## 'plotChromPeakDensity' directly on the 'XChromatograms' object. See
## ?XChromatograms, section 'Correspondence analysis' for more details.
#FIGURE 22
#In the top figure above (to panel) points are peaks per sample; Black line: peak density distribution; Grey rectangles: grouped peaks (features).
#The second bw=20 plots illustrates a failed correspondence shows a too relaxed bw parameter where all peaks are grouped in to a single
#feature (bottom panel)and how reducing to bw=1 using authors settings (middle panel) enabled grouping of corresponding peaks into different features
#Performing correspondence analysis using optimized peak density settings on on-line xcms. Plotting detected features
xdata <- groupChromPeaks(xdata, param = authorsxcms_pdp)
## Plotting detected features
plot(featureDefinitions(xdata)$rtmed, featureDefinitions(xdata)$mzmed,
xlab = "retention time", ylab = "m/z", main = "features",
col = "#00000080", pch = 21, bg = "#00000040")
grid()
#FIGURE 23
#Evidently, only one color (class) had tryptophan feature
#The featureValues method returns a matrix [features × samples] with features’ abundance estimates.
#This is then generally used as the intensity matrix for downstream analysis
#featureValues parameters:
#value: name of the column in chromPeaks that should be returned, i.e., the default value is “index” which simply return the
#index of peaks in the chromPeaks matrix assigned to each feature. If value = "into" is used, integrated peaks intensities are returned.
#If value = "maxo" maximum peak intensities are returned.
#method: character specifying the method to resolve multi-peak mappings within the same sample, i.e. to define the representative peak
#for a feature in samples where more than one peak was assigned to the feature. If “medret”: select the peak closest to the median retention
#time of the feature. If “maxint”: select the peak yielding the largest signal. If “sum”: sum the values (only if value is “into” or “maxo”).
#Using featureValues to extract the integrated peak intensity per feature/sample used to later answer EXERCISE 3.2
## Getting feature intensity matrix and its dimension
fmat <- featureValues(xdata, value = "maxo", method = "maxint")
dim(fmat)
## [1] 4598 109
#4598 features for 109 samples
#XCMS STEP 4: MISSING VALUE GAP-FILLING:
head(fmat)
## 025_ID_13.mzML 029_ID_15.mzML 034_ID_18.mzML 037_ID_19.mzML
## FT0001 27843.01 34479.62 26524.95 31017.79
## FT0002 29076.42 29702.36 31349.07 28079.20
## FT0003 29076.42 29702.36 29727.96 29338.69
## FT0004 26934.32 33556.23 29479.78 22214.32
## FT0005 30748.50 NA 19809.54 32304.58
## FT0006 14600.60 NA 34103.10 NA
## 040_ID_20.mzML 042_ID_21.mzML 043_ID_22.mzML 048_ID_23.mzML
## FT0001 29744.34 23243.87 24546.83 23198.58
## FT0002 NA NA NA 27748.72
## FT0003 28429.97 30079.15 31342.59 27748.72
## FT0004 33773.66 26768.54 26492.21 NA
## FT0005 NA NA 29406.17 26541.38
## FT0006 NA NA NA 23652.91
## 049_ID_24.mzML 051_ID_25.mzML 054_ID_26.mzML 057_ID_27.mzML
## FT0001 29726.91 25809.96 26738.88 33535.77
## FT0002 29015.79 33210.45 29757.90 29563.08
## FT0003 31008.39 28361.93 30499.21 23022.32
## FT0004 27248.20 26692.89 29453.42 24327.60
## FT0005 26831.48 NA NA NA
## FT0006 NA NA 28551.57 30104.25
## 061_ID_28.mzML 064_ID_29.mzML 066_ID_30.mzML 067_ID_31.mzML
## FT0001 28604.88 27813.90 21994.74 29004.81
## FT0002 28701.80 30931.46 28451.79 24885.24
## FT0003 25018.85 29137.00 31448.89 28631.66
## FT0004 28604.88 27813.90 28291.45 29004.81
## FT0005 NA 25011.01 25307.34 NA
## FT0006 NA NA NA 26374.29
## 073_ID_32.mzML 076_ID_34.mzML 078_ID_35.mzML 079_ID_36.mzML
## FT0001 24761.76 26703.18 26992.72 23201.10
## FT0002 30282.98 29291.85 27813.46 27855.41
## FT0003 25945.18 24370.65 24763.60 30402.71
## FT0004 24761.76 26703.18 25044.49 NA
## FT0005 27987.83 NA NA 30387.99
## FT0006 NA NA 10643.87 NA
## 084_ID_37.mzML 086_ID_39.mzML 089_ID_40.mzML 092_ID_43.mzML
## FT0001 26521.74 29207.36 29797.81 25072.23
## FT0002 NA 29443.10 NA 32000.62
## FT0003 27863.24 29443.10 27897.71 32000.62
## FT0004 12566.25 25629.41 25476.68 25072.23
## FT0005 27439.20 NA NA NA
## FT0006 NA 25459.08 27545.14 24329.84
## 094_ID_44.mzML 096_ID_46.mzML 098_ID_48.mzML 104_ID_49.mzML
## FT0001 24835.88 30670.75 23411.82 23646.72
## FT0002 25192.93 25704.10 23667.73 24310.68
## FT0003 25292.88 25613.32 23667.73 23095.89
## FT0004 31370.05 27793.85 22425.44 23646.72
## FT0005 23559.81 29109.70 26854.81 NA
## FT0006 NA NA 14309.44 22718.22
## 105_ID_50.mzML 106_ID_51.mzML 111_ID_53.mzML 027_ID_2.mzML 050_ID_7.mzML
## FT0001 25977.95 29362.71 29600.33 31771.25 26315.32
## FT0002 NA 26028.30 29901.60 28600.66 24838.22
## FT0003 26513.67 27879.70 29901.60 30838.32 27549.58
## FT0004 27382.94 29362.71 29600.33 25506.29 14589.31
## FT0005 NA NA 12339.34 28882.62 28731.40
## FT0006 NA 28037.84 NA NA NA
## 059_ID_9.mzML 081_ID_10.mzML 024_ID_55.mzML 044_ID_57.mzML
## FT0001 28765.64 25059.92 28969.41 24833.18
## FT0002 30175.77 NA 26413.20 27924.41
## FT0003 NA 27500.04 29589.01 29671.10
## FT0004 23727.16 25059.92 29323.01 27493.92
## FT0005 NA NA 31899.10 NA
## FT0006 26021.86 NA NA 30109.64
## 046_ID_59.mzML 053_ID_60.mzML 058_ID_61.mzML 065_ID_64.mzML
## FT0001 23055.42 31503.44 27649.44 22407.12
## FT0002 NA 25542.70 28015.03 27933.84
## FT0003 29769.39 25572.85 26822.60 30542.00
## FT0004 15419.41 31503.44 27421.98 29033.67
## FT0005 25040.85 29682.21 NA 13224.76
## FT0006 NA NA NA 29090.89
## 071_ID_68.mzML 074_ID_69.mzML 103_ID_67.mzML 110_ID_72.mzML
## FT0001 20162.79 26259.01 28590.463 22798.17
## FT0002 26718.18 28825.19 25618.637 NA
## FT0003 27340.30 28865.93 25511.271 27218.62
## FT0004 28078.35 24673.29 8548.301 22798.17
## FT0005 28892.01 26259.01 26408.814 NA
## FT0006 NA 25895.54 26835.020 28842.56
## 113_ID_73.mzML 115_ID_74.mzML 116_ID_75.mzML 028_ID_14.mzML
## FT0001 26577.551 22698.40 27698.91 28998.44
## FT0002 26398.295 27975.87 26559.39 28567.84
## FT0003 23493.902 25116.61 26605.74 32654.76
## FT0004 9656.163 NA 22873.20 28998.44
## FT0005 25689.816 NA NA 28480.63
## FT0006 11009.935 27266.39 24205.55 NA
## 030_ID_16.mzML 033_ID_17.mzML 075_ID_33.mzML 085_ID_38.mzML
## FT0001 26409.08 NA 27764.90 26664.57
## FT0002 NA NA 25138.19 27506.69
## FT0003 30501.60 18344.55 26669.65 29060.07
## FT0004 26409.08 15175.34 20259.33 30328.46
## FT0005 NA NA 23070.46 26789.80
## FT0006 NA 17061.30 NA NA
## 090_ID_41.mzML 091_ID_42.mzML 095_ID_45.mzML 097_ID_47.mzML
## FT0001 25241.66 22433.74 26439.947 25778.63
## FT0002 NA NA 27148.381 24010.79
## FT0003 28180.96 26759.40 27983.418 26803.51
## FT0004 27640.32 NA 23437.547 24460.62
## FT0005 NA 28597.33 9745.189 NA
## FT0006 24687.12 11317.37 NA NA
## 109_ID_52.mzML 114_ID_54.mzML 023_ID_1.mzML 036_ID_4.mzML 038_ID_5.mzML
## FT0001 28353.57 21820.24 31019.64 26018.84 25940.18
## FT0002 23511.16 25382.69 30808.99 25226.90 29617.47
## FT0003 28076.47 27616.15 30808.99 33481.34 25882.33
## FT0004 NA 26254.72 26969.84 28904.97 29745.66
## FT0005 NA 25556.65 32720.39 NA NA
## FT0006 26786.39 NA NA NA NA
## 039_ID_6.mzML 055_ID_8.mzML 083_ID_11.mzML 101_ID_12.mzML 108_ID_3.mzML
## FT0001 23518.36 27421.74 30407.82 22532.95 25327.12
## FT0002 29559.99 26216.22 NA 27700.67 NA
## FT0003 29391.04 31689.03 32706.89 28398.03 26658.76
## FT0004 25562.45 27899.82 30120.88 25289.98 25327.12
## FT0005 NA 14857.79 NA 27027.91 NA
## FT0006 NA NA NA 10967.88 NA
## 032_ID_56.mzML 045_ID_58.mzML 060_ID_62.mzML 063_ID_63.mzML
## FT0001 31359.18 30550.10 27156.01 28382.04
## FT0002 29406.09 33455.03 29644.10 29333.17
## FT0003 30947.31 26467.55 29644.10 29333.17
## FT0004 30675.06 30612.49 25464.76 28755.22
## FT0005 30476.69 30598.65 NA 27545.89
## FT0006 20874.53 20924.35 NA NA
## 069_ID_65.mzML 070_ID_66.mzML 080_ID_70.mzML 100_ID_71.mzML 009_QC.mzML
## FT0001 26893.69 29163.96 NA 27018.52 NA
## FT0002 30763.07 30116.00 NA NA NA
## FT0003 30763.07 30416.18 NA 27103.72 NA
## FT0004 27489.33 26641.31 NA 26081.21 NA
## FT0005 11981.47 NA NA NA NA
## FT0006 26520.24 11285.44 NA 23397.52 NA
## 010_QC.mzML 011_QC.mzML 012_QC.mzML 013_QC.mzML 015_QC.mzML 016_QC.mzML
## FT0001 NA 19679.03 21767.20 22631.83 NA NA
## FT0002 NA 20536.72 22474.01 23559.95 NA NA
## FT0003 NA 21045.86 19392.70 23559.95 NA NA
## FT0004 NA 18737.00 21767.20 22631.83 NA NA
## FT0005 NA 20017.95 NA 21774.06 NA NA
## FT0006 NA NA NA NA NA NA
## 017_QC.mzML 018_QC.mzML 020_QC.mzML 021_QC.mzML 022_QC.mzML 026_QC.mzML
## FT0001 23664.86 19737.88 28970.70 22141.49 28761.33 26219.70
## FT0002 24279.93 23305.87 NA 29037.53 29293.62 NA
## FT0003 20594.68 24781.99 27393.73 32620.65 31214.14 32010.24
## FT0004 NA 20981.41 27482.69 29057.40 27869.46 NA
## FT0005 NA 20451.69 29175.32 NA NA NA
## FT0006 23798.03 NA NA NA 31331.94 27810.80
## 031_QC.mzML 035_QC.mzML 041_QC.mzML 047_QC.mzML 052_QC.mzML 056_QC.mzML
## FT0001 31301.18 31761.80 24629.67 29636.32 25971.69 25482.86
## FT0002 NA 34522.49 30358.97 29415.61 23339.12 32762.91
## FT0003 31674.81 34522.49 26351.08 30141.38 28886.91 29733.02
## FT0004 NA NA 29379.61 24810.07 25565.06 29731.86
## FT0005 29180.90 NA 29286.12 28652.24 NA 31452.52
## FT0006 NA 30820.39 NA NA 25971.69 NA
## 062_QC.mzML 068_QC.mzML 072_QC.mzML 077_QC.mzML 082_QC.mzML 087_QC.mzML
## FT0001 23840.04 25000.77 28400.52 24859.31 26011.600 28085.58
## FT0002 26361.49 27841.78 NA 31584.44 25859.502 26309.75
## FT0003 27635.02 27841.78 17513.38 27788.01 26843.705 26309.75
## FT0004 23808.01 NA 15169.00 26228.12 9929.569 25011.10
## FT0005 NA NA NA NA NA 26048.21
## FT0006 30806.64 21355.58 NA 22659.35 25264.398 NA
## 088_QC.mzML 093_QC.mzML 099_QC.mzML 102_QC.mzML 107_QC.mzML 112_QC.mzML
## FT0001 28174.48 26141.03 21787.62 NA 25246.53 27079.80
## FT0002 NA NA 23969.16 NA 27069.00 30676.13
## FT0003 26307.55 23536.84 25727.46 NA 27069.00 26665.15
## FT0004 27713.18 10161.94 21787.62 NA 12037.21 26302.65
## FT0005 12304.73 28838.80 NA NA NA 31587.47
## FT0006 NA 28290.68 26131.56 NA 26159.38 NA
## 117_QC.mzML 118_QC.mzML 119_QC.mzML
## FT0001 25302.055 24631.85 27856.07
## FT0002 31897.518 29913.26 26703.67
## FT0003 31897.518 28671.08 27296.40
## FT0004 27320.213 24631.85 23961.30
## FT0005 NA 26352.79 25202.50
## FT0006 9782.741 NA NA
#Evidently, there are several missing values in this feature matrix. Peak detection may have failed in one sample, resulting in Peak feature matrix returning NA values.
#Ion is not present in a sample. Missing values are reported if in one sample no chromatographic peak was detected
#in the m/z - rt region of the feature. This does however not necessarily mean that there is no signal for that specific ion in that sample.
#The chromatographic peak detection algorithm could also just have failed to identify any peak in that region,
#e.g. because the signal was too noisy or too low. Thus after correspondence, it is recommended to perform a gap-filling step
#to handle missing values and avoid sparse matrices for subsequent statistical analysis.
#fillChromPeaks allows to fill-in signal for missing peaks from the feature area (defined by the median rt and mz of all peaks assigned to the feature).
#Data to fill-in are retrieved from original mzML files
#fillChromPeaks Parameters:
#expandMz, expandRt: expands the region from which signal is integrated in m/z or rt dimension.
#A value of 0 means no expansion, 1 means the region is grown by half of the feature’s m/z width on both sides.
#ppm: expand the m/z width by a m/z dependent value.
#Fill in missing values and compare the number of missing values before and after doing it.
##Obtaining missing values before filling in peaks
apply(featureValues(xdata, filled = FALSE), MARGIN = 2,FUN = function(z) sum(is.na(z)))
## 025_ID_13.mzML 029_ID_15.mzML 034_ID_18.mzML 037_ID_19.mzML 040_ID_20.mzML
## 2552 2784 2699 2648 2803
## 042_ID_21.mzML 043_ID_22.mzML 048_ID_23.mzML 049_ID_24.mzML 051_ID_25.mzML
## 2802 2430 2982 2738 3011
## 054_ID_26.mzML 057_ID_27.mzML 061_ID_28.mzML 064_ID_29.mzML 066_ID_30.mzML
## 2642 2591 2565 2783 2556
## 067_ID_31.mzML 073_ID_32.mzML 076_ID_34.mzML 078_ID_35.mzML 079_ID_36.mzML
## 3274 2411 2513 2776 2681
## 084_ID_37.mzML 086_ID_39.mzML 089_ID_40.mzML 092_ID_43.mzML 094_ID_44.mzML
## 2622 2659 2509 2628 2688
## 096_ID_46.mzML 098_ID_48.mzML 104_ID_49.mzML 105_ID_50.mzML 106_ID_51.mzML
## 2745 2692 3119 2686 3170
## 111_ID_53.mzML 027_ID_2.mzML 050_ID_7.mzML 059_ID_9.mzML 081_ID_10.mzML
## 2900 2792 2776 3159 2801
## 024_ID_55.mzML 044_ID_57.mzML 046_ID_59.mzML 053_ID_60.mzML 058_ID_61.mzML
## 2640 2866 2937 2903 2741
## 065_ID_64.mzML 071_ID_68.mzML 074_ID_69.mzML 103_ID_67.mzML 110_ID_72.mzML
## 2766 2950 2965 2776 2902
## 113_ID_73.mzML 115_ID_74.mzML 116_ID_75.mzML 028_ID_14.mzML 030_ID_16.mzML
## 2716 2482 2733 2878 2559
## 033_ID_17.mzML 075_ID_33.mzML 085_ID_38.mzML 090_ID_41.mzML 091_ID_42.mzML
## 3550 2696 2318 2773 2604
## 095_ID_45.mzML 097_ID_47.mzML 109_ID_52.mzML 114_ID_54.mzML 023_ID_1.mzML
## 2660 2704 2877 2857 2456
## 036_ID_4.mzML 038_ID_5.mzML 039_ID_6.mzML 055_ID_8.mzML 083_ID_11.mzML
## 2907 2825 2666 2701 2581
## 101_ID_12.mzML 108_ID_3.mzML 032_ID_56.mzML 045_ID_58.mzML 060_ID_62.mzML
## 2680 3218 2900 2880 3190
## 063_ID_63.mzML 069_ID_65.mzML 070_ID_66.mzML 080_ID_70.mzML 100_ID_71.mzML
## 2920 2783 2889 3227 2827
## 009_QC.mzML 010_QC.mzML 011_QC.mzML 012_QC.mzML 013_QC.mzML
## 4096 4127 2821 2710 2724
## 015_QC.mzML 016_QC.mzML 017_QC.mzML 018_QC.mzML 020_QC.mzML
## 2786 2753 2428 2518 2381
## 021_QC.mzML 022_QC.mzML 026_QC.mzML 031_QC.mzML 035_QC.mzML
## 2209 2044 2367 2390 2272
## 041_QC.mzML 047_QC.mzML 052_QC.mzML 056_QC.mzML 062_QC.mzML
## 2127 2228 2263 2225 2353
## 068_QC.mzML 072_QC.mzML 077_QC.mzML 082_QC.mzML 087_QC.mzML
## 2569 2917 2583 2255 2424
## 088_QC.mzML 093_QC.mzML 099_QC.mzML 102_QC.mzML 107_QC.mzML
## 2219 2308 2385 3228 2414
## 112_QC.mzML 117_QC.mzML 118_QC.mzML 119_QC.mzML
## 2344 2412 2210 2341
## Filling in peaks for long-duration
xdata <- fillChromPeaks(xdata)
## Defining peak areas for filling-in .... OK
## Start integrating peak areas from original files
## Requesting 2552 peaks from 025_ID_13.mzML ... got 2046.
## Requesting 2784 peaks from 029_ID_15.mzML ... got 2337.
## Requesting 2699 peaks from 034_ID_18.mzML ... got 2234.
## Requesting 2648 peaks from 037_ID_19.mzML ... got 2078.
## Requesting 2803 peaks from 040_ID_20.mzML ... got 1996.
## Requesting 2802 peaks from 042_ID_21.mzML ... got 2146.
## Requesting 2430 peaks from 043_ID_22.mzML ... got 1919.
## Requesting 2982 peaks from 048_ID_23.mzML ... got 1965.
## Requesting 2738 peaks from 049_ID_24.mzML ... got 2109.
## Requesting 3011 peaks from 051_ID_25.mzML ... got 2436.
## Requesting 2642 peaks from 054_ID_26.mzML ... got 2200.
## Requesting 2591 peaks from 057_ID_27.mzML ... got 2181.
## Requesting 2565 peaks from 061_ID_28.mzML ... got 2021.
## Requesting 2783 peaks from 064_ID_29.mzML ... got 2272.
## Requesting 2556 peaks from 066_ID_30.mzML ... got 1885.
## Requesting 3274 peaks from 067_ID_31.mzML ... got 2318.
## Requesting 2411 peaks from 073_ID_32.mzML ... got 1966.
## Requesting 2513 peaks from 076_ID_34.mzML ... got 1943.
## Requesting 2776 peaks from 078_ID_35.mzML ... got 2232.
## Requesting 2681 peaks from 079_ID_36.mzML ... got 2179.
## Requesting 2622 peaks from 084_ID_37.mzML ... got 2138.
## Requesting 2659 peaks from 086_ID_39.mzML ... got 2034.
## Requesting 2509 peaks from 089_ID_40.mzML ... got 1966.
## Requesting 2628 peaks from 092_ID_43.mzML ... got 2095.
## Requesting 2688 peaks from 094_ID_44.mzML ... got 2041.
## Requesting 2745 peaks from 096_ID_46.mzML ... got 2088.
## Requesting 2692 peaks from 098_ID_48.mzML ... got 2177.
## Requesting 3119 peaks from 104_ID_49.mzML ... got 2319.
## Requesting 2686 peaks from 105_ID_50.mzML ... got 2077.
## Requesting 3170 peaks from 106_ID_51.mzML ... got 2477.
## Requesting 2900 peaks from 111_ID_53.mzML ... got 1977.
## Requesting 2792 peaks from 027_ID_2.mzML ... got 1964.
## Requesting 2776 peaks from 050_ID_7.mzML ... got 1858.
## Requesting 3159 peaks from 059_ID_9.mzML ... got 2377.
## Requesting 2801 peaks from 081_ID_10.mzML ... got 2057.
## Requesting 2640 peaks from 024_ID_55.mzML ... got 2012.
## Requesting 2866 peaks from 044_ID_57.mzML ... got 2008.
## Requesting 2937 peaks from 046_ID_59.mzML ... got 2274.
## Requesting 2903 peaks from 053_ID_60.mzML ... got 2124.
## Requesting 2741 peaks from 058_ID_61.mzML ... got 1962.
## Requesting 2766 peaks from 065_ID_64.mzML ... got 2075.
## Requesting 2950 peaks from 071_ID_68.mzML ... got 2414.
## Requesting 2965 peaks from 074_ID_69.mzML ... got 2332.
## Requesting 2776 peaks from 103_ID_67.mzML ... got 1893.
## Requesting 2902 peaks from 110_ID_72.mzML ... got 2061.
## Requesting 2716 peaks from 113_ID_73.mzML ... got 2105.
## Requesting 2482 peaks from 115_ID_74.mzML ... got 2131.
## Requesting 2733 peaks from 116_ID_75.mzML ... got 2099.
## Requesting 2878 peaks from 028_ID_14.mzML ... got 2341.
## Requesting 2559 peaks from 030_ID_16.mzML ... got 2071.
## Requesting 3550 peaks from 033_ID_17.mzML ... got 2874.
## Requesting 2696 peaks from 075_ID_33.mzML ... got 1995.
## Requesting 2318 peaks from 085_ID_38.mzML ... got 1814.
## Requesting 2773 peaks from 090_ID_41.mzML ... got 2204.
## Requesting 2604 peaks from 091_ID_42.mzML ... got 2092.
##
## Requesting 2660 peaks from 095_ID_45.mzML ... got 2019.
## Requesting 2704 peaks from 097_ID_47.mzML ... got 2022.
## Requesting 2877 peaks from 109_ID_52.mzML ... got 1787.
## Requesting 2857 peaks from 114_ID_54.mzML ... got 2196.
## Requesting 2456 peaks from 023_ID_1.mzML ... got 1783.
## Requesting 2907 peaks from 036_ID_4.mzML ... got 2295.
## Requesting 2825 peaks from 038_ID_5.mzML ... got 1937.
## Requesting 2666 peaks from 039_ID_6.mzML ... got 2048.
## Requesting 2701 peaks from 055_ID_8.mzML ... got 1857.
## Requesting 2581 peaks from 083_ID_11.mzML ... got 1920.
## Requesting 2680 peaks from 101_ID_12.mzML ... got 1906.
## Requesting 3218 peaks from 108_ID_3.mzML ... got 2343.
## Requesting 2900 peaks from 032_ID_56.mzML ... got 2357.
## Requesting 2880 peaks from 045_ID_58.mzML ... got 2278.
## Requesting 3190 peaks from 060_ID_62.mzML ... got 2734.
## Requesting 2920 peaks from 063_ID_63.mzML ... got 2294.
## Requesting 2783 peaks from 069_ID_65.mzML ... got 1971.
## Requesting 2889 peaks from 070_ID_66.mzML ... got 2088.
## Requesting 3227 peaks from 080_ID_70.mzML ... got 2719.
## Requesting 2827 peaks from 100_ID_71.mzML ... got 2312.
## Requesting 4096 peaks from 009_QC.mzML ... got 3575.
## Requesting 4127 peaks from 010_QC.mzML ... got 3586.
## Requesting 2821 peaks from 011_QC.mzML ... got 1930.
## Requesting 2710 peaks from 012_QC.mzML ... got 1721.
## Requesting 2724 peaks from 013_QC.mzML ... got 1922.
## Requesting 2786 peaks from 015_QC.mzML ... got 1981.
## Requesting 2753 peaks from 016_QC.mzML ... got 2095.
## Requesting 2428 peaks from 017_QC.mzML ... got 1864.
## Requesting 2518 peaks from 018_QC.mzML ... got 2020.
## Requesting 2381 peaks from 020_QC.mzML ... got 2275.
## Requesting 2209 peaks from 021_QC.mzML ... got 2124.
## Requesting 2044 peaks from 022_QC.mzML ... got 1988.
## Requesting 2367 peaks from 026_QC.mzML ... got 2323.
## Requesting 2390 peaks from 031_QC.mzML ... got 2349.
## Requesting 2272 peaks from 035_QC.mzML ... got 2146.
## Requesting 2127 peaks from 041_QC.mzML ... got 2088.
## Requesting 2228 peaks from 047_QC.mzML ... got 2184.
## Requesting 2263 peaks from 052_QC.mzML ... got 2196.
## Requesting 2225 peaks from 056_QC.mzML ... got 2188.
## Requesting 2353 peaks from 062_QC.mzML ... got 2313.
## Requesting 2569 peaks from 068_QC.mzML ... got 2510.
## Requesting 2917 peaks from 072_QC.mzML ... got 2728.
## Requesting 2583 peaks from 077_QC.mzML ... got 2431.
## Requesting 2255 peaks from 082_QC.mzML ... got 2215.
## Requesting 2424 peaks from 087_QC.mzML ... got 2386.
## Requesting 2219 peaks from 088_QC.mzML ... got 2125.
## Requesting 2308 peaks from 093_QC.mzML ... got 2180.
## Requesting 2385 peaks from 099_QC.mzML ... got 2342.
## Requesting 3228 peaks from 102_QC.mzML ... got 2606.
## Requesting 2414 peaks from 107_QC.mzML ... got 2220.
## Requesting 2344 peaks from 112_QC.mzML ... got 2278.
## Requesting 2412 peaks from 117_QC.mzML ... got 2360.
## Requesting 2210 peaks from 118_QC.mzML ... got 2125.
## Requesting 2341 peaks from 119_QC.mzML ... got 2250.
## Getting missing values after filling in peaks
apply(featureValues(xdata), MARGIN = 2,FUN = function(z) sum(is.na(z)))
## 025_ID_13.mzML 029_ID_15.mzML 034_ID_18.mzML 037_ID_19.mzML 040_ID_20.mzML
## 506 447 465 570 807
## 042_ID_21.mzML 043_ID_22.mzML 048_ID_23.mzML 049_ID_24.mzML 051_ID_25.mzML
## 656 511 1017 629 575
## 054_ID_26.mzML 057_ID_27.mzML 061_ID_28.mzML 064_ID_29.mzML 066_ID_30.mzML
## 442 410 544 511 671
## 067_ID_31.mzML 073_ID_32.mzML 076_ID_34.mzML 078_ID_35.mzML 079_ID_36.mzML
## 956 445 570 544 502
## 084_ID_37.mzML 086_ID_39.mzML 089_ID_40.mzML 092_ID_43.mzML 094_ID_44.mzML
## 484 625 543 533 647
## 096_ID_46.mzML 098_ID_48.mzML 104_ID_49.mzML 105_ID_50.mzML 106_ID_51.mzML
## 657 515 800 609 693
## 111_ID_53.mzML 027_ID_2.mzML 050_ID_7.mzML 059_ID_9.mzML 081_ID_10.mzML
## 923 828 918 782 744
## 024_ID_55.mzML 044_ID_57.mzML 046_ID_59.mzML 053_ID_60.mzML 058_ID_61.mzML
## 628 858 663 779 779
## 065_ID_64.mzML 071_ID_68.mzML 074_ID_69.mzML 103_ID_67.mzML 110_ID_72.mzML
## 691 536 633 883 841
## 113_ID_73.mzML 115_ID_74.mzML 116_ID_75.mzML 028_ID_14.mzML 030_ID_16.mzML
## 611 351 634 537 488
## 033_ID_17.mzML 075_ID_33.mzML 085_ID_38.mzML 090_ID_41.mzML 091_ID_42.mzML
## 676 701 504 569 512
## 095_ID_45.mzML 097_ID_47.mzML 109_ID_52.mzML 114_ID_54.mzML 023_ID_1.mzML
## 641 682 1090 661 673
## 036_ID_4.mzML 038_ID_5.mzML 039_ID_6.mzML 055_ID_8.mzML 083_ID_11.mzML
## 612 888 618 844 661
## 101_ID_12.mzML 108_ID_3.mzML 032_ID_56.mzML 045_ID_58.mzML 060_ID_62.mzML
## 774 875 543 602 456
## 063_ID_63.mzML 069_ID_65.mzML 070_ID_66.mzML 080_ID_70.mzML 100_ID_71.mzML
## 626 812 801 508 515
## 009_QC.mzML 010_QC.mzML 011_QC.mzML 012_QC.mzML 013_QC.mzML
## 521 541 891 989 802
## 015_QC.mzML 016_QC.mzML 017_QC.mzML 018_QC.mzML 020_QC.mzML
## 805 658 564 498 106
## 021_QC.mzML 022_QC.mzML 026_QC.mzML 031_QC.mzML 035_QC.mzML
## 85 56 44 41 126
## 041_QC.mzML 047_QC.mzML 052_QC.mzML 056_QC.mzML 062_QC.mzML
## 39 44 67 37 40
## 068_QC.mzML 072_QC.mzML 077_QC.mzML 082_QC.mzML 087_QC.mzML
## 59 189 152 40 38
## 088_QC.mzML 093_QC.mzML 099_QC.mzML 102_QC.mzML 107_QC.mzML
## 94 128 43 622 194
## 112_QC.mzML 117_QC.mzML 118_QC.mzML 119_QC.mzML
## 66 52 85 91
#XCMS STEP 7: Review History
#XCMSnExp objects allow to capture all performed pre-processing steps along with the used parameter class within the (processHistory?) slot.
#Storing also the parameter class ensures the highest possible degree of analysis documentation and in future might enable to replay
#analyses or parts of it. The list of all performed preprocessings can be extracted using the processHistory method.
#Summary of xcms process
processHistory(xdata)
## [[1]]
## Object of class "XProcessHistory"
## type: Peak detection
## date: Fri Jun 14 11:11:39 2024
## info:
## fileIndex: 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109
## Parameter class: CentWaveParam
## MS level(s) 1
##
## [[2]]
## Object of class "XProcessHistory"
## type: Peak grouping
## date: Fri Jun 14 11:26:23 2024
## info:
## fileIndex: 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109
## Parameter class: PeakDensityParam
## MS level(s) 1
##
## [[3]]
## Object of class "XProcessHistory"
## type: Missing peak filling
## date: Fri Jun 14 11:27:58 2024
## info:
## fileIndex: 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109
## Parameter class: FillChromPeaksParam
## MS level(s) 1
#EXERCISE 3.2: For the experimental conditions state: How many peaks were detected on average per sample? How many features were finally detected?
#The number of detected peaks was 815088. THe number of feature detected was 4598 features for 109 samples.
#found in 80% of the samples of at least one of the experimental groups. These results are supported by the following code:
chromPeaks(xdata) |>dim()
## [1] 815088 11
#The number of detected features was 4598
### Get feature information
featureDefinitions(xdata) |>dim()
## [1] 4598 16
feature.info <- featureDefinitions(xdata) |>as.data.frame()
#Obtain intensity values for each feature
D <- featureValues(xdata, value = "maxo", method = "maxint")|> as.data.frame()
fmat <- featureValues(xdata, value = "maxo", method = "maxint")
dim(fmat)
## [1] 4598 109
#Therefore 4598 features for 109 samples
4598/109
## [1] 42.18349
#42.18349 features per sample differs from later calculation
#Sample representation criteria:
# Compute number of samples per experimental group and minimum percent
n.samplespergroup <- table(pData(xdata)$class)
percent <- 0.8 #apply the desired percentage
samples.min <- round(n.samplespergroup*percent,0)
samples_per_group <- feature.info[,c(grep("npeaks", colnames(feature.info))+1:length(samples.min))]
samples_per_group_QC <- samples_per_group[colnames(samples_per_group) != "QC"]
## remove QCs
samples.min_QC <- samples.min[names(samples.min)!= "QC"]
resta <- apply(samples_per_group_QC, 1, function(x)
{any(x-as.vector(samples.min_QC)>0)}
)
idx_s <- rownames(D)[which(resta==TRUE)]
cat(length(idx_s), "out of", dim(D)[1], "found in 80% of the samples of at least one of the experimental groups")
## 814 out of 4598 found in 80% of the samples of at least one of the experimental groups
head(feature.info[idx_s,],5)
## mzmed mzmin mzmax rtmed rtmin rtmax npeaks CTR_day28
## FT0001 65.99609 65.99604 65.99614 4.735785 1.38937 9.00962 122 31
## FT0002 65.99610 65.99604 65.99615 32.522800 29.10560 36.70700 84 25
## FT0003 65.99612 65.99604 65.99860 40.400100 36.86330 44.86390 175 31
## FT0004 65.99612 65.99606 65.99855 14.936600 10.87560 18.32840 135 29
## FT0008 66.01156 66.01148 66.01790 5.505415 1.39262 11.17870 194 29
## CTR_day7 CTR_termage PAT_day28 PAT_day7 PAT_termage QC peakidx
## FT0001 4 13 10 8 7 29 180, 682....
## FT0002 3 11 7 6 6 23 184, 683....
## FT0003 3 13 11 8 7 29 182, 187....
## FT0004 4 12 9 8 7 24 183, 682....
## FT0008 3 12 10 8 7 22 190, 951....
## ms_level
## FT0001 1
## FT0002 1
## FT0003 1
## FT0004 1
## FT0008 1
#Minimum intensity threshold
## Computing the mean intensities for each group
class <- pData(xdata)$class
median.intensities <- as.data.frame(t(apply(D,1, function(x) tapply(x, class, median, na.rm = TRUE))))
median.intensities$QC <- NULL #remove QC group
# Establish intensity threshold value
thresholdvalue <- 10000
#Get number of features with mean intensity above certain threshold counts in at least one of the groups except QC group
idx_i <- names(which(apply(median.intensities,1, function(x) any(x>thresholdvalue)==TRUE)==TRUE))
cat(paste(length(idx_i), "out of", dim(median.intensities)[1]), "above the threshold intensity value")
## 4499 out of 4598 above the threshold intensity value
#EXERCISE 4.1.1 Use your feature matrix to reproduce the statistical analysis performed by authors as they state in the pre-print: #Do you get to the same PCA conclusions as in Figure 1B?
Figure 1B depicts PCA and Boxplots of the distance to the centroid of the combined (RP+HILIC/Negative+ Positive Polarity)data obtained from the untargeted LC-HRMS experiments of all sampling time points of fecal samples t proved experimental overview. Generally, the metabolome of the pathological and the control CTR PAT group exhibited no group-specific clustering in the PCA, neither in the plasma nor in the feces. However, there was clearly clustering between sampling time points, as seen in plasma and, in our specific feces samples.Time point was a significant, while group affiliation was not deemed significant (p<0.001). Individual factors dominated differences in metabolome between patients (PATIENT ID). Average euclidean. In fecal samples, slightly tighter clustering was shown at Day 28 along with a tendency to spread out at term equivalent age, indicating a diversification of the gut metabolome over time and possibly also the gut microbiome.
Less variance (15.54% +4.24%=19.78%) was evidently captured by my first 2 PCA components compared to (25.75%+7.13%=32.88%) prior to removal of QC class samples. Unlike the authors’ Figure 1B, for fecal samples, there does not appear to be clustering around sampling time (7days, 28days, termage) but there is clustering for QC class before it was removed as a variable.The approach taken for this involved following filtering criteria and explotaroty data analysis described below:
#Filtering criteria
#1.Samples representation in each feature: Retain just those features consistently found across i.e., 80% samples in at least one experimental group
#2.Minimum intensity threshold: Filter out those features below a certain intensity threshold. To ensure recording suitable MS/MS spectra for ID purposes parent
#ions should present a minimum intensity
#3.Analytical variability: Features should enclose more biological variability than analytical variability. Analytical variability is computed
#using QC samples, i.e., a pool of each individual sample entering the study and periodically analyzed along the worklist.
#Filter out those features with more analytical than biological variability
#4.Statistical Criteria: Retain those features enclosing significant variation according to our experimental design
#5.Library hits: Retain only mzRT returning a MS1 match with a library compound (i.e, HMDB)
#Exploratory data analysis to assess analytical variability and identify differences between treated, untreated and control samples using PCA.
#Computing PCA first with 4 classes (day28, day7, termage, QC)
#Filter out features
D2 <- D[intersect(idx_i, idx_s),]
## Row-wise normalization
D.norm <- as.data.frame(apply(D2, 1, function(x) (x/max(x, na.rm = TRUE))))
D.norm[is.na(D.norm)] <- 0
pca_data <- prcomp(D.norm,retx=TRUE, center=TRUE, scale=FALSE)
summary(pca_data)$importance[,c(1:3)]
## PC1 PC2 PC3
## Standard deviation 2.20438 1.159596 0.8441621
## Proportion of Variance 0.25754 0.071270 0.0377700
## Cumulative Proportion 0.25754 0.328800 0.3665700
D.norm.backup<-D.norm
## Computing PCA first with 4 classes (day28, day7, termage, QC)
D.norm.backup$class<-pheno$class
D.norm.backup$class2 <- str_remove(D.norm.backup$class, "PAT_")
D.norm.backup$class2 <- str_remove(D.norm.backup$class2, "CTR_")
test.pca.plot <- autoplot(pca_data, data = D.norm.backup, colour = "class2")
test.pca.plot
#Evidently, there 25.75%+7.13%=32.88% of total variance is captured by first 2 Principal components based on eigenvectors and values
#Filtering out analytical variability
#Function to calculate RSD%
RSD <- function(x){100*sd(x, na.rm = TRUE)/mean(x, na.rm = TRUE)}
## Define QC idx
QChits <- which(class == "QC")
## Compute RSDs across samples and QCs
RSD_samples <- apply(D[,-QChits],1, RSD)
RSD_QC <- apply(D[, QChits],1, RSD)
## Filter those features with RSD(QCs) > RSD (Samples)
idx_qc <- names(RSD_samples)[which(RSD_QC< RSD_samples)]
cat(paste(length(idx_qc), "out of", dim(median.intensities)[1]), "hold higher biological than analytical variation")
## 2493 out of 4598 hold higher biological than analytical variation
#2493 out of 4598 hold higher biological than analytical variation
#Multivariate Data analysis
# Merge intensity sample representation and QCs criteria
data2stats <- D[intersect(intersect(idx_i, idx_s),idx_qc),-c(grep("QC", class))]
colnames(data2stats) <- pData(xdata)$sample_name[which(pData(xdata)$class!="QC")]
# Row-wise normalization
D.norm <- as.data.frame(apply(data2stats, 1, function(x) (x/max(x, na.rm = TRUE))))
D.norm[is.na(D.norm)] <- 0
# Compute PCA
pca_data <- prcomp(D.norm,retx=TRUE, center=TRUE, scale=FALSE)
summary(pca_data)$importance[,c(1:3)]
## PC1 PC2 PC3
## Standard deviation 1.042971 0.8223921 0.7759304
## Proportion of Variance 0.134450 0.0835900 0.0744100
## Cumulative Proportion 0.134450 0.2180400 0.2924500
# Scores data
D.norm.backup2 <- D.norm.backup[!D.norm.backup$class2%in% c("QC"), ]
scores <- data.frame(pca_data$x[, c("PC1", "PC2")])
scores$class<-D.norm.backup2$class2
#scores$class <- cl[-c(grep("QC", class))]
scores$lab <- rownames(D.norm)
scores.plot <- ggplot(data = scores, aes(x = PC1, y = PC2, colour = class, label=lab)) +
geom_point(alpha = I(0.7), size = 4) +
geom_hline(yintercept = 0)+
geom_vline(xintercept = 0)+
xlab(paste("PC1 (", round(summary(pca_data)$importance[2,1], 2) * 100, "%)"))+
ylab(paste("PC2 (", round(summary(pca_data)$importance[2,2], 2) * 100, "%)"))+
stat_ellipse() +
theme_bw()
ggplotly(scores.plot)
## Warning: The following aesthetics were dropped during statistical transformation: label.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
#FIGURE 24 WARNING
#Since my plot did not display, I am again re-creating PCA of Figure 1B with only 3 time-point classes with no QC class like the authors:
#Remove QC rows
D.norm.backup2 <- D.norm.backup[!D.norm.backup$class2%in% c("QC"), ]
dim(D.norm.backup2)
## [1] 75 797
#only perform PCA on continuous variables (not categorical class or class2 columns)
pca_data2 <- prcomp(D.norm.backup2[,c(1:795)], center = TRUE,scale. = TRUE)
# summary of the prcomp object
summary(pca_data2)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 11.1142 5.80918 5.47244 4.96612 4.57723 4.46275 4.38570
## Proportion of Variance 0.1554 0.04245 0.03767 0.03102 0.02635 0.02505 0.02419
## Cumulative Proportion 0.1554 0.19783 0.23550 0.26652 0.29287 0.31792 0.34212
## PC8 PC9 PC10 PC11 PC12 PC13 PC14
## Standard deviation 4.25241 4.08275 3.98678 3.90890 3.80256 3.73155 3.62014
## Proportion of Variance 0.02275 0.02097 0.01999 0.01922 0.01819 0.01752 0.01648
## Cumulative Proportion 0.36486 0.38583 0.40582 0.42504 0.44323 0.46075 0.47723
## PC15 PC16 PC17 PC18 PC19 PC20 PC21
## Standard deviation 3.57630 3.52562 3.42158 3.41182 3.38006 3.35006 3.23664
## Proportion of Variance 0.01609 0.01564 0.01473 0.01464 0.01437 0.01412 0.01318
## Cumulative Proportion 0.49332 0.50895 0.52368 0.53832 0.55269 0.56681 0.57999
## PC22 PC23 PC24 PC25 PC26 PC27 PC28
## Standard deviation 3.20587 3.14474 3.12363 3.0754 3.05711 3.02630 3.00089
## Proportion of Variance 0.01293 0.01244 0.01227 0.0119 0.01176 0.01152 0.01133
## Cumulative Proportion 0.59291 0.60535 0.61763 0.6295 0.64128 0.65280 0.66413
## PC29 PC30 PC31 PC32 PC33 PC34 PC35
## Standard deviation 2.94721 2.92381 2.90934 2.87159 2.8197 2.80354 2.79527
## Proportion of Variance 0.01093 0.01075 0.01065 0.01037 0.0100 0.00989 0.00983
## Cumulative Proportion 0.67505 0.68581 0.69645 0.70683 0.7168 0.72671 0.73654
## PC36 PC37 PC38 PC39 PC40 PC41 PC42
## Standard deviation 2.75780 2.75354 2.74205 2.71187 2.6891 2.65507 2.62562
## Proportion of Variance 0.00957 0.00954 0.00946 0.00925 0.0091 0.00887 0.00867
## Cumulative Proportion 0.74611 0.75565 0.76510 0.77435 0.7834 0.79232 0.80099
## PC43 PC44 PC45 PC46 PC47 PC48 PC49
## Standard deviation 2.60910 2.5995 2.52589 2.5054 2.49536 2.47919 2.4412
## Proportion of Variance 0.00856 0.0085 0.00803 0.0079 0.00783 0.00773 0.0075
## Cumulative Proportion 0.80955 0.8181 0.82608 0.8340 0.84180 0.84954 0.8570
## PC50 PC51 PC52 PC53 PC54 PC55 PC56
## Standard deviation 2.43342 2.40095 2.37036 2.34366 2.32373 2.32098 2.30290
## Proportion of Variance 0.00745 0.00725 0.00707 0.00691 0.00679 0.00678 0.00667
## Cumulative Proportion 0.86448 0.87173 0.87880 0.88571 0.89250 0.89928 0.90595
## PC57 PC58 PC59 PC60 PC61 PC62 PC63
## Standard deviation 2.26369 2.25356 2.22134 2.17232 2.14557 2.13478 2.09511
## Proportion of Variance 0.00645 0.00639 0.00621 0.00594 0.00579 0.00573 0.00552
## Cumulative Proportion 0.91239 0.91878 0.92499 0.93092 0.93671 0.94245 0.94797
## PC64 PC65 PC66 PC67 PC68 PC69 PC70
## Standard deviation 2.08128 2.04343 2.00293 2.00058 1.97761 1.94569 1.9335
## Proportion of Variance 0.00545 0.00525 0.00505 0.00503 0.00492 0.00476 0.0047
## Cumulative Proportion 0.95342 0.95867 0.96371 0.96875 0.97367 0.97843 0.9831
## PC71 PC72 PC73 PC74 PC75
## Standard deviation 1.89759 1.85339 1.81646 1.75321 8.58e-15
## Proportion of Variance 0.00453 0.00432 0.00415 0.00387 0.00e+00
## Cumulative Proportion 0.98766 0.99198 0.99613 1.00000 1.00e+00
#Only 75 cohort sample PCA eigenvalues/vectors show instead of total 109 samples
test.pca.plot2 <- autoplot(pca_data2, data = D.norm.backup2, colour = "class2")
test.pca.plot2
#FIGURE 25
#Less variance (15.54% +4.24%=19.78%) is evidently captured by first 2 PCA components
#compared to (25.75%+7.13%=32.88%) prior to removal of QC class samples.
#Unlike the authors' Figure 1B, for fecal samples, there does not appear to be clustering
#around sampling time (7days, 28days, termage) but there is clustering for QC class before it was removed as a variable'
#Univariate Data analysis
## Remove suspicious samples
#data2stats_no <- data2stats[,-c(3,21)]
#data2stats_no[is.na(data2stats_no)] <- 0
#cl_no <- cl[-c(3,21)]; cl_no <- cl_no[cl_no!='QC']
## Perform an ANOVA comparison for the treated, untreated and control groups
#gr <- as.factor(cl_no)
gr <- as.factor(D.norm.backup2$class2)
pm <- matrix(ncol=3,nrow=dim(data2stats[1]))
for(i in 1:nrow(data2stats)){
aov.out <- aov(as.numeric(data2stats[i,]) ~ gr)
multcomp <- TukeyHSD(aov.out)
pm[i,] <- as.matrix(multcomp$gr[,"p adj"])
}
rownames(pm) <- rownames(data2stats)
colnames(pm) <- rownames(multcomp$gr)
head(pm)
## day7-day28 termage-day28 termage-day7
## FT0013 3.858744e-01 0.8014029 7.466148e-01
## FT0024 2.210264e-01 0.5264500 7.551302e-01
## FT0046 2.721035e-02 0.9592086 3.241805e-02
## FT0189 4.369110e-05 0.9650340 3.744574e-04
## FT0190 6.247173e-01 0.3623413 1.608004e-01
## FT0202 2.120265e-05 0.8787176 3.136047e-05
p.val.adj.day7.day28 <- p.adjust(pm[,"day7-day28"],"fdr")
p.val.adj.termage.day28 <- p.adjust(pm[,"termage-day28"],"fdr")
p.val.adj.termage.day7 <- p.adjust(pm[,"termage-day7"],"fdr")
## Create a function to compute FC
fc.test <- function(df, classvec, class.case, class.control) {
coln <- names(df)
case <- df[which(coln == class.case)]
control <- df[which(coln == class.control)]
logFC <- log2(case/control)
FC <- case/control
FC2 <- -control/case
FC[FC < 1] <- FC2[FC < 1]
fc.res <- c(FC, logFC)
names(fc.res) <- c("FC", "logFC")
return(fc.res)
}
# Calculate median intensities
median.intensities <- t(apply(data2stats, 1, tapply, D.norm.backup2$class2, median))
# Calculate FC for 'day7-day28' groups
fc.day7.day28 <- as.data.frame(t(apply(median.intensities, 1, function(x)
fc.test(x, classvec = D.norm.backup2$class2, class.case = "day7", class.control = "day28"))))
colnames(fc.day7.day28) <- c("FC_day7_day28", "logFC_day7_day28")
# Calculate FC for 'termage-day28' groups
fc.termage.day28 <- as.data.frame(t(apply(median.intensities, 1, function(x)
fc.test(x, classvec = D.norm.backup2$class2, class.case = "termage", class.control = "day28"))))
colnames(fc.termage.day28) <- c("FC_termage_day28", "logFC_termage_day28")
# Calculate FC for 'termage-day7' groups
fc.termage.day7 <- as.data.frame(t(apply(median.intensities, 1, function(x)
fc.test(x, classvec = D.norm.backup2$class2, class.case = "termage", class.control = "day7"))))
colnames(fc.termage.day7) <- c("FC_termage_day7", "logFC_termage_day7")
idx_day7_day28 <- which(abs(fc.day7.day28$FC)> 2 & p.val.adj.day7.day28 < 0.05)
idx_termage_day28 <- which(abs(fc.termage.day28$FC)> 2 & p.val.adj.termage.day28 < 0.05)
idx_termage_day7 <- which(abs(fc.termage.day7$FC)> 2 & p.val.adj.termage.day7 < 0.05)
cat(paste(length(idx_day7_day28), "out of", dim(data2stats)[1]), " features significantly varying in untreated vs ctr comparison")
## 11 out of 246 features significantly varying in untreated vs ctr comparison
#11 out of 246 features significantly varying in untreated vs ctr comparison>
cat(paste(length(idx_termage_day28), "out of", dim(data2stats)[1]), "significantly varying in treated vs untreated comparison")
## 0 out of 246 significantly varying in treated vs untreated comparison
#0 out of 246 significantly varying in treated vs untreated comparison>
cat(paste(length(idx_termage_day7), "out of", dim(data2stats)[1]), "significantly varying in treated vs untreated comparison")
## 11 out of 246 significantly varying in treated vs untreated comparison
#11 out of 246 significantly varying in treated vs untreated comparison
# Draw Volcano plot for either comparisons
RES <- cbind.data.frame(median.intensities, fc.day7.day28, p.val.adj.day7.day28, fc.termage.day28,p.val.adj.termage.day28,fc.termage.day7,p.val.adj.termage.day7)
RES$labels <- rownames(RES)
par(mar = c(1, 1, 1, 1))
p1 <- ggplot(data = RES, aes(x = logFC_day7_day28, y = -log10(p.val.adj.day7.day28),label = labels)) +
geom_point(alpha = 0.4, size = 1.75) + theme(legend.position = "none") +
xlim(-10, 10) +
geom_hline(yintercept = -log10(0.05), linetype = "dashed") +
geom_vline(xintercept = log2(2), linetype = "dashed") +
geom_vline(xintercept = -log2(2), linetype = "dashed") +
labs(x = "log2(FC)", y = "-log10(p.adj)", title = "day7vsday28") + theme_bw()
p2 <- ggplot(data = RES, aes(x = logFC_termage_day28,y=-log10(p.val.adj.termage.day28),label = labels)) +
geom_point(alpha = 0.4, size = 1.75) + theme(legend.position = "none") +
xlim(-10, 10) +
geom_hline(yintercept = -log10(0.05), linetype = "dashed") +
geom_vline(xintercept = log2(2), linetype = "dashed") +
geom_vline(xintercept = -log2(2), linetype = "dashed") +
labs(x = "log2(FC)", y = "-log10(p.adj)", title = "termagevsday28") + theme_bw()
ggplotly(p2)
#FIGURE 26
## Summarize results and plot Volcano plot
RES <- cbind.data.frame(median.intensities, fc.day7.day28, p.val.adj.day7.day28, fc.termage.day28, p.val.adj.termage.day28,fc.termage.day7,p.val.adj.termage.day7)
#Error Generated
# Combine FC values as a criteria to sort features
#idx.s <- union(idx_day7_day28, idx_termage_day28, idx_termage_day7)
idx.s <- union(idx_day7_day28, idx_termage_day28)
idx.s <- union(idx.s, idx_termage_day7)
RES.sig <- RES[idx.s,]
#EXERCISE 4.1.2: Reproduce Figure 3F for Tryptophan for HILIC ESI(-) conditions. How do their levels compare to those from their metabolites? #Is there any rationale?
Figure 3F depicts boxplot of several tryptophan metabolites with altered abundance between experimental groups in plasma and feces. The tryptophan metabolites to be displayed are INDOLELACTIC ACID, TRYPTAMINE, 5-HYDROXYINDOLE.According to the authors, tryptamine, also a neuroactive TRP metabolite, was positively correlated with Enterococcus, while indole, the neuroactive TRP metabolite, was negatively correlated with Staphylococcus”.Therefore, we will retrieving MS1 putative hits using HMDB for these three metabolites as follows:
#A simple matching of feature m/z values against theoretical m/z values is in HMDB performed through MetaboAnnotation package.
#This package, defines high-level user functionality to support and facilitate annotation of MS-based metabolomics data.
#As a result, we obtain MS1 putative annotation of features. This is the lowest level of confidence in metabolite annotation.
#However, it gives ideas about potential metabolites that can be analyzed in further downstream experiments and analyses
#To keep the runtime of this tutorial short, MS1 match is done after feature filtering. However, MS1 annotation can be done before
#any filter is performed. This should eventually help to better control false positives during statistical comparison.
#First, we define the target compounds database (HMDB) in which we have only the monoisotopic mass available for each entry.
#We need to convert this monoisotopic mass to m/z values in order to match the m/z values from the features (i.e. the query m/z values)
#against them. For this we need to define the most likely ions/adducts that would be generated from the compounds based on the ionization
#mode used in the experiment. We assume the most abundant adducts from the compounds being “[M+H]+”. Notice that you can get a full list of
#supported adducts the MetaboCoreUtils::adductNames(polarity = "positive") function can be used.
#' Load HMDB as target database
HMDB_t <- readRDS(file = "C:/Users/User/Desktop/R/Metabolomics/Metabolomics/HMDB_t.rds")
#' Select plausible adducts according to polarity
adducts_negative <- MetaboCoreUtils::adductNames(polarity = "negative")[c(13,16,18,20,36,38)]
#' Select the query data
DF2ID <- cbind.data.frame(feature.info[rownames(RES.sig),], RES.sig)
DF2ID$features_id <- rownames(DF2ID)
DF2ID$mz <- DF2ID$mzmed
#We next perform the matching with the matchValues function providing the query and target data
#as well as a parameter object (in our case a Mass2MzParam) with the settings for the matching.
#With this Mass2MzParam, the monoisotopic mass of target compounds in the data base get first converted
#to m/z values, based on the defined adducts, and these are then matched against the query m/z values
#(i.e. the m/z values for the features).
#' Define parameters to perform the search
#' Use only [M-H] adduct
parmH <- Mass2MzParam(adducts = adducts_negative[1],tolerance = 0.005, ppm = 20)
#' Match values
matched_featuresH <- matchValues(query = DF2ID, target = HMDB_t , param = parmH)
#' Explore and interpret results
matched_featuresH
## Object of class Matched
## Total number of matches: 16
## Number of query objects: 16 (3 matched)
## Number of target objects: 216772 (16 matched)
#From the 16 features in the query 3 are matched against at least one target compound in the HMDB_t dataframe
#(all matches are against a single compound).
#matchedData returns a DataFrame with all rows in query and their corresponding matches
#in target along with the matching adduct (column "adduct") and the difference in m/z
#(colum "score" for absolute differences and "ppm_error" for the m/z relative differences).
#Note that if a row in query matches multiple elements in target, this row will be duplicated
#in the DataFrame returned by data. For rows that can not be matched NA values are reported.
##The tryptophan metabolites to be displayed like authors Figure 3F are INDOLELACTIC ACID, TRYPTAMINE, 5-HYDROXYINDOLE
#First, Find putative ID for features representing INDOLELACTIC ACID:
#INDOLELACTIC ACID (indole-3-lactic acid)
##Data based on following scientific references:
#https://hmdb.ca/metabolites/HMDB0000671
#Apex for INDOLELACTIC ACID = 254
#Metabolomics. 2015; 11(3): 696–706.
#Published online 2014 Sep 7. doi: 10.1007/s11306-014-0727-x
#PMCID: PMC4419193
#PMID: 25972771
#Predicting retention time in hydrophilic interaction liquid chromatography mass spectrometry and its use for peak annotation in metabolomics
#Mingshu Cao,corresponding author1 Karl Fraser,1 Jan Huege,1 Tom Featonby,1 Susanne Rasmussen,2 and Chris Jones
#Pediatr Res. 2020 Aug; 88(2): 209–217.
#Published online 2020 Jan 16. doi: 10.1038/s41390-019-0740-x
#PMCID: PMC7363505
#NIHMSID: NIHMS1547607
#PMID: 31945773
#Indole-3-lactic acid, a metabolite of tryptophan, secreted by Bifidobacterium longum subspecies infantis is anti-inflammatory in the immature intestine.
#Di Meng,1 Eduardo Sommella,2 Emanuela Salviati,2,4 Pietro Campiglia,2,3 Kriston Ganguli,1 Karim Djebali,1 Weishu Zhu,1 and W. Allan Walker
IDfeaturesMS1 <- matchedData(matched_featuresH)
# Find INDOLELACTIC ACID (indole-3-lactic acid) putative ID in MS1 considering 10 ppm
# error and rt window = 20 seconds.
M <- 205.073893223
H <- 1.007276
MH <- M - H
## Define INDOLELACTIC ACID peak mz range considering 10 ppm error
error <- 10
mmu_min <- MH-(error*MH/1e6)
mmu_max <- MH+(error*MH/1e6)
mzr <- c(mmu_min, mmu_max)
## Define taurine peak rt range width
apex.indolelacticacid <- 9.25*60
rt.window <- 20 #max rt width in seconds
rtr <- c(apex.indolelacticacid-rt.window, apex.indolelacticacid+rt.window)
## Plot raw data to check whether there is taurine
IDfeaturesMS1_indolelacticacid <- IDfeaturesMS1 |> as.data.frame() |> filter(between (mzmed, mzr[1], mzr[2])) |> filter(between (rtmed, rtr[1], rtr[2])) |> select(c(1,4,24:31))
#FIGURE 27
IDfeaturesMS1_indolelacticacid
## [1] mzmed rtmed logFC_termage_day28
## [4] p.val.adj.termage.day28 FC_termage_day7 logFC_termage_day7
## [7] p.val.adj.termage.day7 features_id mz
## [10] target_compound_id
## <0 rows> (or 0-length row.names)
#TRYPTAMINE
##Data based on following scientific references:
#https://hmdb.ca/metabolites/HMDB0000303
#Visualization and Identification of Neurotransmitters in Crustacean Brain via Multifaceted Mass Spectrometric Approaches
#Qinjingwen Cao, Yijia Wang, Bingming Chen, Lingjun Li
#ACS Chem. Neurosci. 2019, 10, 3, 1222–1229
#Publication Date:February 5, 2019
#https://doi.org/10.1021/acschemneuro.8b00730
IDfeaturesMS1 <- matchedData(matched_featuresH)
# Find TRYPTAMINE putative ID in MS1 considering 10 ppm error and rt window = 20 seconds.
M <- 160.100048394
H <- 1.007276
MH <- M - H
## Define TRYPTAMINE peak mz range considering 10 ppm error
error <- 10
mmu_min <- MH-(error*MH/1e6)
mmu_max <- MH+(error*MH/1e6)
mzr <- c(mmu_min, mmu_max)
## Define TRYPTAMINE peak rt range width
apex.tryptamine <- 5.92*60
rt.window <- 20 #max rt width in seconds
rtr <- c(apex.tryptamine-rt.window, apex.tryptamine+rt.window)
## Plot raw data to check whether there is tryptamine
IDfeaturesMS1_tryptamine <- IDfeaturesMS1 |> as.data.frame() |> filter(between (mzmed, mzr[1], mzr[2])) |> filter(between (rtmed, rtr[1], rtr[2])) |> select(c(1,4,24:31))
#FIGURE 28
IDfeaturesMS1_tryptamine
## [1] mzmed rtmed logFC_termage_day28
## [4] p.val.adj.termage.day28 FC_termage_day7 logFC_termage_day7
## [7] p.val.adj.termage.day7 features_id mz
## [10] target_compound_id
## <0 rows> (or 0-length row.names)
#5-HYDROXYINDOLE
#Data based on following scientific references: https://hmdb.ca/metabolites/HMDB0059805
#J Endocr Soc. 2021 Aug 1; 5(8): bvab106.
#Published online 2021 Jun 8. doi: 10.1210/jendso/bvab106
#PMCID: PMC8237842
#PMID: 34195530
#Comparison of Serum and Urinary 5-Hydroxyindoleacetic Acid as Biomarker for Neuroendocrine Neoplasms
#Anna Becker,1 Camilla Schalin-Jäntti,2 and Outi Itkonen
IDfeaturesMS1 <- matchedData(matched_featuresH)
# Find 5-hydroxyindole putative ID in MS1 considering 10 ppm
# error and rt window = 20 seconds.
M <- 133.052763851
H <- 1.007276
MH <- M - H
## Define 5-hydroxyindole peak mz range considering 10 ppm error
error <- 10
mmu_min <- MH-(error*MH/1e6)
mmu_max <- MH+(error*MH/1e6)
mzr <- c(mmu_min, mmu_max)
## Define 5-hydroxyindole peak rt range width
apex.hydroxyindole <- 5.92*60
rt.window <- 20 #max rt width in seconds
rtr <- c(apex.hydroxyindole-rt.window, apex.hydroxyindole+rt.window)
## Plot raw data to check whether there is hydroxyindole
IDfeaturesMS1_hydroxyindole <- IDfeaturesMS1 |>as.data.frame() |> filter(between (mzmed, mzr[1], mzr[2])) |> filter(between (rtmed, rtr[1], rtr[2])) |> select(c(1,4,24:31))
#FIGURE 29
IDfeaturesMS1_hydroxyindole
## [1] mzmed rtmed logFC_termage_day28
## [4] p.val.adj.termage.day28 FC_termage_day7 logFC_termage_day7
## [7] p.val.adj.termage.day7 features_id mz
## [10] target_compound_id
## <0 rows> (or 0-length row.names)
#Due to low match numbers, no data is available to plot and mimic authors Figure 3F
#ENVIPAT
#Using enviPat package to predict the theoretical isotopic distribution for at least one of the 3 metabolites of authors
#Figure 3F (tryptamine) #detected in ESI(-) mode as [M-H]- adduct.
#Find tryptamine molecular formula in HMDB "C10H12N2"
predicted.tryptamine.pattern<-enviPat::isopattern(isotopes,"C10H12N2",threshold=1,plotit=FALSE,charge=-1,emass=0.00054858,algo=1)
## done.
predicted.tryptamine.pattern$C10H12N2
## m/z abundance 1H 2H 14N 15N 12C 13C
## [1,] 160.1006 100.00000 12 0 2 0 10 0
## [2,] 161.1040 10.81573 12 0 2 0 9 1
# Using "Spectra" package to create spectrum from isotopic prediction
sp_tryptamine_predicted <- DataFrame(msLevel = c(1L),polarity = c(0L), id = c("HMDB0000303 "),name = c("Tryptamine"))
## Assign m/z and intensity values.
#sp_tryptamine_predicted$mz <- list(predicted.tryptamine.pattern$C10H12N2[c(1,3,5), 1])
#Error in predicted.tryptamine.pattern$C10H12N2[c(1, 3, 5), 1] :
# subscript out of bounds
sp_tryptamine_predicted$mz <- list(predicted.tryptamine.pattern$C10H12N2)
sp_tryptamine_predicted$intensity <- list(predicted.tryptamine.pattern$C10H12N2)
#theoretical <- Spectra(sp_tryptamine_predicted) |>setBackend(MsBackendDataFrame())
#Error in x[[1L]] : subscript out of bounds
#theoretical$name <- "Predicted"
#Run cliqueMS anotation on our featured data
#Find cliques
set.seed(2)
#ex.cliqueGroups <- getCliques(xdata, filter = TRUE)
#Creating anClique object
#Creating network
#Error: Cannot allocate vector of size 651.4 Gb
#Find isotopes
#ex.Isotopes <- getIsotopes(ex.cliqueGroups, ppm = 10)
# Load positive adducts information
#data(positive.adinfo)
#head(positive.adinfo)
#ex.Adducts <- getAnnotation(ex.Isotopes, ppm = 10,adinfo = positive.adinfo, polarity = "positive",normalizeScore = TRUE)
#Retrieve annotation results
#REScliqueMS <- getPeaklistanClique(ex.Adducts)
#Define the rt and m/z range of tryptamine peak
#M <- 160.100048394
#H <- 1.007276
#MH <- M + H
#' Define tryptamine peak mz range considering 10 ppm error
error <- 10
mmu_min <- MH-(error*MH/1e6)
mmu_max <- MH+(error*MH/1e6)
mzr <- c(mmu_min, mmu_max)
#' Define tryptamine peak rt range width
apex.tryptamine <- 5.92*60
rt.window <- 20 #max rt width in seconds
rtr <- c(apex.tryptamine-rt.window, apex.tryptamine+rt.window)
#Plot raw data to check whether there is tryptamine
#REScliqueMS |> filter(between (mz, mzr[1], mzr[2])) |> filter(between (rt, rtr[1], rtr[2]))
#Annotations for TRYPTAMINE in the entire dataset
#REScliqueMS[grep("1800", REScliqueMS$isotope),])
##EXERCISE 4.2:
#EXERCISE 4.2.1:Why do to think authors solely use protonated/deprotonated adducts?
“The most important qualitative rule to predict which fragment will retain the charge and how abundant will be in the spectrum is to investigate the proton affinity of their neutral counterparts.Field’s rule states that the higher is the proton affinity of a neutral, the higher is the intensity of its charged form in the spectrum. There is much less information on the fragmentation processes of deprotonated molecules in the literature partly because the formation of deprotonated molecules is usually limited to compounds possessing acidic protons (the bile acids and L-aspartic acid of our study)or compounds that could generate such ones by tautomerism. Other reasons include the poor understanding of fragmentation processes and technical difficulties. The signal intensity for ES− is usually lower, therefore whenever possible, use of ES+ is recommended. However, ES− is of high importance e.g., for the characterization of flavonoids, oligosaccharides, carboxylic acids, sulfonamides, oligonucleotides and rarely peptides. For a more comprehensive characterization, additional types of fragmentation can be explored: For exammple, the fragmentation characteristics of an [M + zH]z+ type protonated molecule (even-electron pre-cursor) and an M+ radical type molecular ion(odd-electro precursor) is significantly different. OE ions tend to fragment at more random sites, while even-electron ions usually produce less but more stable fragments by cleavages at the thermochemically least stable bonds. It is also possible to form an odd-electron ion from an ESI-generated multiply charged even-electron ion precursor with methods of electron capture dissociation (ECD) or electron transfer dissociation (ETD) which nowadays have widely-used application in sequencing of peptides and other biomolecule”
#EXERCISE 4.2.2: What would you suggest for a more comprehensive characterization instead? How would you suggest to improve the preprint?
The pre-print and related study can be improved by incorporating into the experimental design more appropriate selection of solvents (polar, non-polar) and expanding separation strategies to include those based on a multitude of properties (size, hydro-, hydrophilicity, relative volatility, negative and positive charge, solublity, affinity, color. This pre-print study can also be improved by furthering the current integratation of analysis of lipophilic compounds by reversed-phase liquid chromatography (RP-LC)/MS with analysis of water-soluble compounds by hydrophilic interaction liquid chromatography (HILIC)/MS to improve coverage, Method optimization and selection of the most optimal instrumentation platforms and ionization modes can also exapand coverage and detection limits.A reference-data-driven analysis to match metabolomics tandem mass spectrometry (MS/MS)data against metadata-annotated source data as a pseudo-MS/MS reference library has been proposed to incease the low 10 % annotation rate of human untargeted metabolomics studies. This appraoch can be applied to this current study, as it has been demonstrated to increase MS/MS spectral usage 5.1-fold over conventional structural MS/MS library matches and allows empirical assessment of dietary patterns from untargeted data. EnviPat::check_chemform() tools can be used to get closer to level 2 confidence when comparing spectra with background of similar empirical isotopic distributions informs The study and the metabolic community can be bolstered by gradually overcoming challenges associated with with MS/MS data interpretation, database content, isomer resolution, identification confidence, and FDR estimation.Specifically, and inceasing the number of Validations of retention times and MS/MS fragmentation data with a reference standards.
#Citation 1: Steckel A, Schlosser G. An Organic Chemist’s Guide to Electrospray Mass Spectrometric Structure Elucidation. Molecules. #2019 Feb 10;24(3):611. doi: 10.3390/molecules24030611. PMID: 30744143; PMCID: PMC6384780. #Citation 2: #Gauglitz, J.M., West, K.A., Bittremieux, W. et al. Enhancing untargeted metabolomics using metadata-based source annotation. #Nat Biotechnol 40, 1774–1779 (2022). https://doi.org/10.1038/s41587-022-01368-1 #https://doi.org/10.1038/s41587-022-01368-1 #Citation 3Souza AL, Patti GJ. A Protocol for Untargeted Metabolomic Analysis: From Sample Preparation to Data Processing. #Methods Mol Biol. 2021;2276:357-382. doi: 10.1007/978-1-0716-1266-8_27. PMID: 34060055; PMCID: PMC9284939.